Changes to String and 'good old ASCII'


#1

Some of my code needs to be revamped to fit the new string features. I work a lot with control protocols, and they’re generally ASCII based (or rarely, binary). Some obvious stuff I was doing seems to not work now, such as adding a single byte, i.e. 0x0d. My current substitute is the fairly verbose: << String::charToString (0x0d);

There also may be a Mac bug with toUTF8(), since functions like:

Are causing crashes (debugger can’t work out where the actual problem is). Am I meant to use a different function now? The SendBuf function takes a copy, btw, it isn’t doing anything evil and has worked for years.

In essence, I’m using Strings to construct ASCII only sets of characters, with frequent non-printable characters and that aren’t null terminated. I’ve done this also with MemoryBlocks, but it’s a lot easier to use strings since the protocol is generally human readable text and there’s more tools.

Any suggestions? Can’t I lock strings into an ASCII form? Which function is not the closest to 'getASCIIbytes() and the corresponding getASCIIsize()?

Bruce


#2

If you’re genuinely using ASCII, i.e. if all your characters are < 127, then utf8 is identical, so it should make no difference at all. What I’m not providing any more are locale-specific codepages, which is a whole messy can of worms - if people need that, they can use their own C library functions to convert the UTF16 or UTF32 to whatever they want, and I’ll stay well clear of it.

I don’t think there are any bugs with the conversion, I’ve been using and testing it heavily. Your example:

…should probably be written like this:

but if none of the characters are over 127, I think that’s functionally the same as what you had before.

And if you want to add a byte, you’d just need to cast it to a char or juce_wchar, e.g.

s << (char) 0x0d; s << (juce_wchar) 0x0d;

(but surely that was the same before, because just writing s << 0x0d would always have written it as an integer?)


#3

A very good article : http://www.joelonsoftware.com/articles/Unicode.html


#4

Thanks, I’ll read that. I was perfectly happy knowing nothing about unicode, and a lot about ASCII and byte based protocols, but it looks like I have to deal with ASCII being ‘phased out’ now.

Jules - I did try casting the char, but possibly a slightly different way? Of course, in the world of unprintable characters, it can be hard to tell. I’ll revisit the simple way of doing it. There is something odd about toUTF8 on my machine though - I’ll try to pin it down.

Bruce


#5

Keep in mind that toUtf8 increases the size of the underlying String object every time its called, it piggybacks its storage at the end of the existing object. So while it doesn’t change the value of the String (it is still Utf-32 in the latest tip I believe), it does increase it’s size.


#6

OK, the stack was getting mangled and gdb was getting completely lost when it tried to debug it, so I looked and oops: this was my last juce project that uses a static library, not the newer ‘split across files’ approach. I went in and did a clean and rebuild on that in case xcode wasn’t clever enough.

I guess it was different, since I immediately get this assertion firing every time I draw:

void Component::paintEntireComponent (Graphics& g, const bool ignoreAlphaLevel) { jassert (! g.isClipEmpty());

Putting that aside for a sec, //

I’m back to sending like a champ. Thanks for the help.

Not sure why that assertion is firing, but I can dig into that further.

Bruce


#7

It’s not being phased-out, ASCII is just the name for the set of characters below 128, that’s all. If you never need to use anything above that, you can happily go on using utf8 and treat it as ASCII, it’ll work just fine.

Huh?? It doesn’t grow the storage every time you call it, that’d be ridiculous! Less of the FUD, please!


#8

You’re right I looked at it again, it only increases the storage once.


#9

And since you’re coming back to this thread anyway, I said ‘phased out’ not phased out.

I just thought that this was a bit long:

It looks like I should consider more carefully the fact that ASCII is no longer the preferred default storage format for strings, and it is no longer reliable to assume that a string only contains ASCII legal characters.

So there.