No german umlauts with libcurl
-
- Posts: 862
- Joined: May 05, 2015 5:35
- Location: Germany
No german umlauts with libcurl
libcurl in general works fine here, but if I call a URL containing german umlauts (eg. https://de.wikipedia.org/wiki/Spezial:Z ... lige_Seite) the receive buffer remains empty (seems to be a general problem with UTF-8-coded URLs).
Does anyone know how to get it working? Or maybe another library without that issue?
Thanks in advance.
(WinXP 32bit / FB 1.05 here)
Does anyone know how to get it working? Or maybe another library without that issue?
Thanks in advance.
(WinXP 32bit / FB 1.05 here)
Re: No german umlauts with libcurl
libcurl with OpenSSL ?grindstone wrote:libcurl in general works fine here, but if I call a URL containing german umlauts (eg. https://de.wikipedia.org/wiki/Spezial:Z ... lige_Seite) the receive buffer remains empty (seems to be a general problem with UTF-8-coded URLs).
Does anyone know how to get it working? Or maybe another library without that issue?
Thanks in advance.
(WinXP 32bit / FB 1.05 here)
https supported ?
-
- Posts: 862
- Joined: May 05, 2015 5:35
- Location: Germany
Re: No german umlauts with libcurl
libcurl natively supports https, that's the reason why I'm using it.
As I read there's the same problem with cyrillic characters, too.
I read about a hack that solves that issue, but then I had to compile my own version of libcurl.
As I read there's the same problem with cyrillic characters, too.
I read about a hack that solves that issue, but then I had to compile my own version of libcurl.
Re: No german umlauts with libcurl
In my libcurl examples url with https does not work
url with http work
Maybe
https://en.wikipedia.org/wiki/Punycode
converter url to punycode:
https://www.punycoder.com/
url with http work
Maybe
https://en.wikipedia.org/wiki/Punycode
converter url to punycode:
https://www.punycoder.com/
Re: No german umlauts with libcurl
I've made a quick test with another library, and it works perfectly, both with Umlauts as shown below and with the escapes (->Source & exe): So it really seems a libcurl problem. Btw clicking https://de.wikipedia.org/wiki/Spezial:Zufällige_Seite does not work with the SMF forum software at Masm32 ("Spezialseite nicht vorhanden"), but it works fine here with phpBB.
Code: Select all
Inkey NoTag$(FileRead$("https://de.wikipedia.org/wiki/Spezial:Zufällige_Seite"))
-
- Posts: 862
- Joined: May 05, 2015 5:35
- Location: Germany
Re: No german umlauts with libcurl
What library did you use?
When I run your DownloadPageWithUmlauts.exe I only get a console window showing "#D-U"
When I run your DownloadPageWithUmlauts.exe I only get a console window showing "#D-U"
Re: No german umlauts with libcurl
#D-U means that InternetOpenUrlA failed (that's under the hood of FileRead$). It might be your firewall, for example. Here (Windows 7-64, Italian version) it works fine, my PC is not so well protected.
Now I checked on my Win10 notebook, here is what I get, directly from the zip archive:
Now I checked on my Win10 notebook, here is what I get, directly from the zip archive:
Code: Select all
Pure minua (2006)
Alasin (2006)
Pirunkieli (2007)
Helvettiin Jäätynyt (2008)
Lihaa Vasten Lihaa (2008)
Ei Koskaan (2008) Musikvideos [ Bearbeiten | Quelltext bearbeiten ]
Kiroan (2002)
Epilogi (2002)
Tuonen viemää (2005)
Mies yli laidan (2006)
Alasin (2006)
Ei koskaan (2008) Weblinks [ Bearbeiten | Quelltext bearbeiten ]
Offizielle Website (finnisch)
Interview mit Frontmann Patrik Mennander
Bandvorstellung: Ruoska Einzelnachweise [ Bearbeiten | Quelltext bearbeiten ] ↑ a b c mindbreed.de RRZN, 1. September 2007 ↑ a b nordische-musik.de RRZN, 2004 ↑ RRZN ↑ RUOSKA in Finnish Charts finnishcharts.com; abgerufen 24. Oktober 2007 Abgerufen von „ https://de.wikipedia.org/w/index.php?title=Ruoska&oldid=179127834 “ Kategorien :
Metal-Band
Dark-Music-Musikgruppe
Finnische Band Navigationsmenü Meine Werkzeuge Nicht angemeldet Diskussionsseite Beiträge Benutzerkonto erstellen Anmelden Namensräume Artikel Diskussion Varianten Ansichten Lesen Bearbeiten Quelltext bearbeiten Versionsgeschichte Mehr Suche Navigation Hauptseite Themenportale Zufälliger Artikel Mitmachen Artikel verbessern Neuen Artikel anlegen Autorenportal Hilfe Letzte Änderungen Kontakt Spenden Werkzeuge Links auf diese Seite Änderungen an verlinkten Seiten Spezialseiten Permanenter Link Seiteninformationen Wikidata-Datenobjekt Artikel zitieren Drucken/exportieren Buch erstellen Als PDF herunterladen Druckversion In anderen Projekten Commons In anderen Sprachen Български English Español Suomi Français Italiano Polski Português Русский Svenska Українська Links bearbeiten Diese Seite wurde zuletzt am 13. Juli 2018 um 16:28 Uhr bearbeitet. Abrufstatistik
Der Text ist unter der Lizenz „Creative Commons Attribution/Share Alike“ verfügbar; Informationen zu den Urhebern und zum Lizenzstatus eingebundener Mediendateien (etwa Bilder oder Videos) können im Regelfall durch Anklicken dieser abgerufen werden. Möglicherweise unterliegen die Inhalte jeweils zusätzlichen Bedingungen. Durch die Nutzung dieser Website erklären Sie sich mit den Nutzungsbedingungen und der Datenschutzrichtlinie einverstanden.
Wikipedia® ist eine eingetragene Marke der Wikimedia Foundation Inc. Datenschutz Über Wikipedia Impressum Entwickler Stellungnahme zu Cookies Mobile Ansicht
-
- Posts: 862
- Joined: May 05, 2015 5:35
- Location: Germany
Re: No german umlauts with libcurl
Same result with firewall off.
Re: No german umlauts with libcurl
Definitely a bug in curl. The command-line curl also returns zero bytes. wget, on the other hand, retrieves the document just fine.
Re: No german umlauts with libcurl
Sorry, no idea why it doesn't work for you. Anybody else? The archive is here.grindstone wrote:Same result with firewall off.
-
- Site Admin
- Posts: 6323
- Joined: Jul 05, 2005 17:32
- Location: Manchester, Lancs
Re: No german umlauts with libcurl
I wouldn't expect the firewall to get involved, since without inspecting packets it can't see the URL requested, only the IP address of the host, and (based on the port) whether HTTP or HTTPS was used.
It occurs to me that we need to be careful when talking about this issue, because there could be a number of factors:
- whether the ä character or %C3%A4 is used in the string
- the encoding used/detected in the bas file
- whether the function is expecting a string or a wstring
- possibly also the OS used, and the charset or encoding in use
- and of course, potentially the version of curl and more significantly, whether it's the library or executable
I tried curl -v "http://de.wikipedia.org/wiki/Spezial:Zufällige_Seite" on Linux, and found that it would send the GET request in UTF-8. (I was slightly surprised to see that it would use percent-encoding only if that was passed, meaning it doesn't convert either way.)
I'm not sure what it would do on the Windows command line, where I don't think it uses UTF-8.
If wget works, that perhaps means it passes the URLs differently.
curl -d "http://de.wikipedia.org/wiki/Spezial:Zufällige_Seite" on Linux does seem to percent-encode the URL, so maybe that's why wget works.
It occurs to me that we need to be careful when talking about this issue, because there could be a number of factors:
- whether the ä character or %C3%A4 is used in the string
- the encoding used/detected in the bas file
- whether the function is expecting a string or a wstring
- possibly also the OS used, and the charset or encoding in use
- and of course, potentially the version of curl and more significantly, whether it's the library or executable
I tried curl -v "http://de.wikipedia.org/wiki/Spezial:Zufällige_Seite" on Linux, and found that it would send the GET request in UTF-8. (I was slightly surprised to see that it would use percent-encoding only if that was passed, meaning it doesn't convert either way.)
I'm not sure what it would do on the Windows command line, where I don't think it uses UTF-8.
If wget works, that perhaps means it passes the URLs differently.
curl -d "http://de.wikipedia.org/wiki/Spezial:Zufällige_Seite" on Linux does seem to percent-encode the URL, so maybe that's why wget works.
Re: No german umlauts with libcurl
My versioncounting_pine wrote:- whether the ä character or %C3%A4 is used in the string
pBuffer=FileRead$("https://de.wikipedia.org/wiki/Spezial:Zufällige_Seite")
definitely uses (tested successfully with Win7-64 and Win10) the ä character in its UTF8 encoding.
Under the hood, it's InternetOpenUrlA. There is also InternetOpenUrlW for UTF16 but it's broken (Microsoft Social):
Do not use InternetOpenUrlW() with lpszHeaders set
You should not use the Unicode version of this function if you want to send additional headers.
If dwHeadersLength excludes the terminating null, the function crashes with error ERROR_HTTP_HEADER_NOT_FOUND.
If it does include the terminating null character, this null character is also sent to the server. This works on Apache servers, but various servers respond with HTTP 400 Bad Request errors.
You can solve the problem by calling WideCharToMultiByte() to convert the URL to ANSI and then calling InternetOpenUrlA(), which accepts dwHeadersLength not including the terminating null.
-
- Posts: 862
- Joined: May 05, 2015 5:35
- Location: Germany
Re: No german umlauts with libcurl
I can confirm that, thank you for the hint to wget. Is there a way to use wget as a library?caseih wrote:wget, on the other hand, retrieves the document just fine.
-
- Posts: 862
- Joined: May 05, 2015 5:35
- Location: Germany
Re: No german umlauts with libcurl
As far as I can see, wget uses the libraries libeay32.dll and libssl32.dll. Does anyone know something about a description or header files (.h / .bi)?
-
- Site Admin
- Posts: 6323
- Joined: Jul 05, 2005 17:32
- Location: Manchester, Lancs
Re: No german umlauts with libcurl
(I assume you're not content with just passing percent-encoded ASCII URLs to the function?)
I think the only problem with libcurl is that it expects UTF-8 strings rather than multibyte strings. That's why it only accepts zstrings.
You should be able to convert a WString to a UTF-8 String using this function:
(I cobbled the above together quickly from https://github.com/freebasic/fbc/blob/m ... nv.bas#L23)
Also, before going further, just to make sure, everyone who hasn't read this Joel Spolsky article should do so:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
I think the only problem with libcurl is that it expects UTF-8 strings rather than multibyte strings. That's why it only accepts zstrings.
You should be able to convert a WString to a UTF-8 String using this function:
Code: Select all
#include once "utf_conv.bi"
function wstr_to_utf8(byref w as wstring) as string
dim as byte ptr utfstr
dim as integer bytes
utfstr = WCharToUTF( UTF_ENCOD_UTF8, w, len( w ) + 1, 0, @bytes )
function = *cptr(zstring ptr, utfstr)
deallocate( utfstr )
end function
Also, before going further, just to make sure, everyone who hasn't read this Joel Spolsky article should do so:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)