CharPointer_ASCII::writeAll behaviour is different from CharPointer_UTF8


#1

Hi Jules,

writeAll on CharPointer_ASCII does not increment the pointer to the data (it uses strcpy), while on UTF8 strings it does the increment.

StringArray::joinIntoString depends on the incrementing logic. I created a custom Ascii string class, and this is where it failed when UTF8 was replaced with ASCII.

Besides, I'm not sure if it's good to check for the null termination inside joinIntoString. Maybe it's better to create some additional function like writeIntoString(data,count) and pass the length explicitly, because in all cases where it is used internally the length is known explicitly?

Thanks,
George.


#2

Ah, good catch! Thanks for letting us know, I've fixed that now.

Don't think there'd be any advantage in passing a length when concatenating the strings - it still has to iterate each character, so checking for null is no harder than (and actually probably quicker than) decrementing and checking a length counter.


#3

On the other hand if a String would explicitly store the pointer to the terminating null, then an append will no longer have to iterate over the initial string. Currently in this code

String a("some text");
a += b;
a += c;
a += d;

will iterate three times over the text stored in a (to find the terminating null).

For this reason, the time complexity of appending N pieces to a string is not O(N), but O(N²), even if you preallocate storage. Usually this doesn't matter but we have a couple of spots in our app where we have to work around this.

--
Roeland

ps. mandatory reference to Joel on Software: Shlemiel the Painter's Algorithm <http://www.joelonsoftware.com/articles/fog0000000319.html>


#4

Well yes, but that's an entirely different question. But definitely not a simple one!

Actual performance of a class like String depends very little on the big-O notation for it, because 99.99% of real-world Strings are too short for big-O differences to come into play.

In fact  the most important performance factors tend to be cache-locality and time spent creating/destroying them. Iterating a 32-byte string to get its length may actually not be any slower than some pointer manipulation to find the length, if it's all sitting in a cache line.

Anyway, since I designed my String class many years ago, lots of people have done a lot of research, and it sounds like the C++ community has mostly agreed that std::string with short-string-optimisation gives the best overall performance, so I'd like at some point to experiment with replacing the juce String internally with the newer std::string implementations.


#5

Yes it is easy to implement a workaround for that 0.01%. For the typical UI programming I agree, that point is totally moot.

--
Roeland


#6

Absolutely. If people are hitting actual performance issues because their strings are so long, it's probably the wrong data structure for whatever they're doing anyway!