Characters encoding bug?


#1

Hello Jules,
I am using a VC2008 Express built version of the old Jucer, since I have just updated my JUCE checkout and discovered that the binaries are no longer distributed.
I have simply opened and then saved a number of .cpp files to have the Jucer automatically convert addButtonListener() calls to addListener(). But some files now contain incorrect characters in C++ comments.

Before:// la correlazione accordo->scale è incompatibile con quella scala->accordi
After:// la correlazione accordo->scale incompatibile con quella scala->accordi

It looks like an encoding problem…


#2

I’m also unable to open a single .cpp file, previously generated with the Jucer, that it now refuses with a “This wasn’t a valid Jucer .cpp file…” message. Tracking the Jucer code with the debugger, it seems that the second parameter in the callint startLine = indexOfLineStartingWith (lines, T("BEGIN_JUCER_METADATA"), 0); at line #470 of jucer_JucerDocument.cpp gets translated to an empty string. So startLine becomes -1 and the entire loadDocumentFromFile() procedure fails.


#3

The jucer expects the files to be in utf-8 - and since I’ve made the string classes more robust, I guess that the new version is rejecting any illegal characters that it finds, to make its input valid utf-8. Are you using the latest tip? (if not, I think the utf-8 parser in the 1.52 release would stop when it encountered an illegal character, but in the most recent check-ins, it ignores it and carries on)


#4

OK, now I understand, I didn’t know about the UTF-8 requirement, I believe this is a recent change, isn’t it?
I’m using the latest tip.
It looks like the Jucer saves its files without BOM, why? My WinMerge app is having troubles with this…


#5

Because BOMs aren’t portable - some compilers can’t build files that use them. It’s annoying, I know, but in C++ the only truly portable source files can only contain ascii characters below 127, and encode unicode string literals as utf-8 using escape sequences.