Number of allocated bytes for String

Is there a way for String to return the number of preallocated bytes? There was discussion a bit ago about adding this:

http://www.juce.com/comment/299036#comment-299036

Hmm. What are you actually trying to do? That sounds like the sort of thing you'd only need to know if you're planning on doing something a bit hacky, like messing with the string contents directly!

There are functions like String::getCharPointer(), CharPointer_UTF8::write(), and CharPointer_UTF8::operator[]() that provide low level read/write access to the contents of a String. Neither write() nor operator[]() check bounds--which is fine, but it seems like if these functions are part of the public interface of the API, then knowing the current "capacity" in bytes of the string would provide a means to check bounds.

I have a text file that I'm reading and parsing words from, and I want to display them in a ListBox. My original idea was to load each of the words into Strings, and although there are only about 327,000 words (which is ~5.3 MB), doing 327,000 heap allocations was expensive! Doing that many heap allocations takes ~6sec on my machine. I thought maybe my text parsing routines were slow, but when it came time to call destructors and deallocate those objects on the heap, it took almost as long!

Naturally, my next idea was to just load the file into memory and then index the file in place. Because Strings make their own copies of things, my strategy was to make an array of unsigned char* for the index. I actually need two such indexes; one for the original order of the words in the file and one for sorted order. I can afford the extra memory; it's the speed of heap de/allocations that is troublesome for me. With one heap allocation for loading the file into memory, and two heap allocations for the indexes, that's three larger heap allocations instead of 327,000 small ones! Having a small memory allocator that worked efficiently with Strings would solve this problem.

To complicate things more, my source data is often not ASCII or UTF-8, so I have a 512 byte table for the ISO8859-15 codepage. So with this scheme, I can just convert the indexed words into JUCE Strings on the fly when I need to render them to the UI. I could convert from ISO8859-15 to a CharPointer_UTF8 and pass that copy to String() to make a second copy, but that seems unnecessary when I can just reuse the same String object instead of making three copies and incurring the repeated heap de/allocations. This conversion works well, and is very fast, but having a debug check to make sure the String has enough capacity would be a nice sanity check.

I find myself needing to parse text, character by character (or codepoint by codepoint), time and time again. The functions in CharPointer_UTF8 et al are really useful! It just seems like the natural companion to preallocateBytes() would be a capacity() method (name it what you will). Since it has come up before, it would add completeness to the direct read/write interface of Strings, it has general use cases, and it would help ensure program correctness, would it be worth adding?

 

Ok, so you might have a use-case for writing directly into the String's memory, but why would you need to know its capacity?

If you look inside the String class itself, it has hundreds of internal methods to do all kinds of crazy operations, but none of them ever needs to know the capacity! For every operation I can imagine, you can just work out the amount of space it'll need, then call preallocateBytes() and do it.. Can't think of a situation where checking its capacity would be helpful.