Unicode, utf-8, and POST data



Apologies in advance if this has been answered before. My grasp of character encodings is still a bit fuzzy. I’m trying to send some POST data to a server If the POST data contains only a subset of ASCII characters it works fine. If it includes characters outside of that, they get mangled. I can see the characters getting mangled in Visual Studio’s debugger. JUCE_STRINGS_ARE_UNICODE is defined. This is what I’m doing:

String test(T("TestöäUnicode"));
URL newUrl = url.withPOSTData(test);
XmlElement * elem = newUrl.readEntireXmlStream(true);

What I see is that URL::appendPostData( ) is called, which calls String::toUTF8( ), which then calls String::copyToUTF8( ). The string gets broken in the copyToUTF8( ) routine. The internal of the string object looks like this:


but what gets copied into the passed in buffer looks like this:


The area of the copyToUTF8( ) routine where the unicode characters get changed is here

buffer [num++] = (uint8) ((0xff << (7 - numExtraBytes)) | (c >> (numExtraBytes * 6)));

while (--numExtraBytes >= 0)
       buffer [num++] = (uint8) (0x80 | (0x3f & (c >> (numExtraBytes * 6))));

I’m not sure what is expected here. It would be great if someone could give some tips to point me in the right direction.


Well, I’m sure it’s being correctly encoded as UTF8, which as far as I know is the correct way to encode the text of a POST request… (any http gurus care to call me out on that?)