I’m reading in some old text data (as C-Strings) that was apparently saved in MAC-IS format. (Maybe that’s also know as Roman?) I need to add these strings (preset names) to a StringArray so I can add an itemList to a Combobox.
I am encountering ellipsis specified as ‘\xc9’, and of course when I try to add it to a StringArray I get the assert of:
“you’re trying to create a string from 8-bit data that contains values greater than 127.”
I found this online:
% version: 1.0
<U00AB> /xc7 LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
<U00BB> /xc8 RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
<U2026> /xc9 HORIZONTAL ELLIPSIS
<U00A0> /xca NO-BREAK SPACE
<U00C0> /xcb LATIN CAPITAL LETTER A WITH GRAVE
<U00C3> /xcc LATIN CAPITAL LETTER A WITH TILDE
How can I convert this automatically into a UTF8 or other format that juice::String will accept? I’ve tried various things but nothing’s working. I should mention I’m working on Mac right now…
If ‘MAC-IS’ is UTF16, you can use the
juce::String UTF16 constructor.
String (CharPointer_UTF16 (data));
// If your string is not null-terminated
String (CharPointer_UTF16 (data), CharPointer_UTF16 (data + dataSize));
You might have to manually preprocess your string if it doesn’t meet Unicode spec.
@oli1 - thanks, that doesn’t seem to work.
What I ended up doing, not sure if it’s a great solution or bulletproof, was to manually convert the ellipsis to a 3-byte UTF8 ellipsis and handle any other weird characters by replacing with a space (so I can maybe handle them as well if I have to):
if (CharPointer_ASCII::isValidString (theTempName, std::numeric_limits<int>::max()))
size_t n, len = strlen (theTempName);
for (n = 0, c = theTempName; n < len; n++, c++)
if ((unsigned char) *c == 0xc9) // ellipsis
// move the portion after this byte down by 2 bytes
jassert(len < sizeof(theTempName) - 2); // enough room?
memmove(theTempName + n + 3, theTempName + n + 1, len - n);
// replace the 3 bytes with the UTF8 code for ellipsis
*c = '\xe2'; c++;
*c = '\x80'; c++;
*c = '\xa6';
len += 2; // we "inserted" 2 bytes
else if ((unsigned char) *c > 127)
*c = ' '; // replace others with a space
… just gives me an error, I tried different things, couldn’t make it work:
No matching conversion for functional-style cast from 'char ' to 'juce::CharPointer_UTF16'
CharPointer_UTF16 expects the source to be of 16bit width (uint16_t). I assumed UTF16 from your snippet, but it looks like it could be a non-standard UTF8 variant.
Unicode is a bit annoying; you can store all variants of Unicode text in a
Your method is probably the best way forward.
It’s most likely an old 8-bit format where 0-127 is ASCII and 128-255 is whatever they felt like.
UTF-8 only keeps the 0-127. You can either translate the values >= 128 from the original format into their UTF-8 equivalents, or strip them out.
That table you found actually has the mappings from MAC-IS to UTF-16, so you’re already halfway there, just need to implement that table and use UTF-8 instead of UTF-16.