Unicode, utf-8, and POST data


#1

Hello,

Apologies in advance if this has been answered before. My grasp of character encodings is still a bit fuzzy. I’m trying to send some POST data to a server If the POST data contains only a subset of ASCII characters it works fine. If it includes characters outside of that, they get mangled. I can see the characters getting mangled in Visual Studio’s debugger. JUCE_STRINGS_ARE_UNICODE is defined. This is what I’m doing:

String test(T("TestöäUnicode"));
URL newUrl = url.withPOSTData(test);
XmlElement * elem = newUrl.readEntireXmlStream(true);

What I see is that URL::appendPostData( ) is called, which calls String::toUTF8( ), which then calls String::copyToUTF8( ). The string gets broken in the copyToUTF8( ) routine. The internal of the string object looks like this:

TestöäUnicode

but what gets copied into the passed in buffer looks like this:

TestöäUnicode

The area of the copyToUTF8( ) routine where the unicode characters get changed is here

buffer [num++] = (uint8) ((0xff << (7 - numExtraBytes)) | (c >> (numExtraBytes * 6)));

while (--numExtraBytes >= 0)
       buffer [num++] = (uint8) (0x80 | (0x3f & (c >> (numExtraBytes * 6))));

I’m not sure what is expected here. It would be great if someone could give some tips to point me in the right direction.


#2

Well, I’m sure it’s being correctly encoded as UTF8, which as far as I know is the correct way to encode the text of a POST request… (any http gurus care to call me out on that?)