Special chars cause crash in StringArray

const char* s = "\xAB";
StringArray sa;
sa.addTokens(s, "+", "");

happens also with other combinations

Whenever you try to create a String from raw data, it’s your responsibility to check that it’s valid UTF8, otherwise you’re in UB territory. I’ve not run this code, but in a debug build I’d certainly expect an assertion if you do that.

debug build I’d certainly expect an assertion

No it does not

responsibility to check

How do I create a string safely from raw-data?
I used MemoryInputStream::readString(), if this can cause corrupted strings, it would be easy to crash any JUCE application with a few broken characters in the settings file.

I’ve not run this code

I’m begging you to do it anyway. This is faster than writing an answer here.
(And it took me a whole night to find this issue)

Ah, it looks like that function takes a StringRef, and unlike String, I don’t think StringRef actually asserts if you pass it garbage. (Adding a check could add a lot of overhead, though I guess you could argue that it’s worth it). But yeah, if InputStream::readString isn’t asserting then that’s definitely a check worth adding.

If you have a chunk of raw string data of dubious origin, you need to check it with something like String::createStringFromData or CharPointer_UTF8::isValidString. Obviously a check like that is very expensive and library classes can’t keep checking every pointer that you give it, so it’s up to you to apply that kind of check at the places in your program where you know the data could be unreliable or malicious.

Thank you very much! I tried String::createStringFromData which works better for my purposes than InputStream::readString

IMHO processing raw-data to string is such a common thing which shouldn’t result in UB.
Practically any user-input, any file, any stream from the network, can be corrupted.

Maybe all methods which are “unsafe” should be marked, like readStringUnchecked() instead readString()

Could this be a security issue?