Unicode, utf-8, and POST data

dans · January 15, 2010, 1:23am

Hello,

Apologies in advance if this has been answered before. My grasp of character encodings is still a bit fuzzy. I’m trying to send some POST data to a server If the POST data contains only a subset of ASCII characters it works fine. If it includes characters outside of that, they get mangled. I can see the characters getting mangled in Visual Studio’s debugger. JUCE_STRINGS_ARE_UNICODE is defined. This is what I’m doing:

String test(T("TestöäUnicode"));
URL newUrl = url.withPOSTData(test);
XmlElement * elem = newUrl.readEntireXmlStream(true);

What I see is that URL::appendPostData( ) is called, which calls String::toUTF8( ), which then calls String::copyToUTF8( ). The string gets broken in the copyToUTF8( ) routine. The internal of the string object looks like this:

TestöäUnicode

but what gets copied into the passed in buffer looks like this:

TestÃ¶Ã¤Unicode

The area of the copyToUTF8( ) routine where the unicode characters get changed is here

buffer [num++] = (uint8) ((0xff << (7 - numExtraBytes)) | (c >> (numExtraBytes * 6)));

while (--numExtraBytes >= 0)
       buffer [num++] = (uint8) (0x80 | (0x3f & (c >> (numExtraBytes * 6))));

I’m not sure what is expected here. It would be great if someone could give some tips to point me in the right direction.

jules · January 15, 2010, 9:41am

Well, I’m sure it’s being correctly encoded as UTF8, which as far as I know is the correct way to encode the text of a POST request… (any http gurus care to call me out on that?)

Topic		Replies	Views
Problem with URL::CreatePostInputStream(...postText) General JUCE discussion	3	306	May 12, 2017
URL escape characters General JUCE discussion	5	909	May 12, 2017
Tracking down a UTF8 issue General JUCE discussion	7	1861	February 8, 2014
Create Strings from data of any other encoding then UTF-8 General JUCE discussion	5	893	May 24, 2011
UTF8 Latin1 tildes problem generating JSON General JUCE discussion	4	1373	April 27, 2015

Unicode, utf-8, and POST data

Purchase

Discover

Learn

Support

About

Events

Unicode, utf-8, and POST data

Related topics

Purchase

Discover

Learn

Support

About

Events