Things that are confusing about String

juce::String is one of the best classes of the framework. It has so many high-level methods in it that save a lot of time when writing text is not the main feature of the feature that implements strings. however there are certain things that are very confusing about juce::String and ironically they only get exposed in situations where you try to do LESS with string than what it can do, I wanna show you what I mean. Say you have a String.

String text;

Maybe you wanna put a word into it.

text = "Hello!";

Cool. Now you might want to replace the exclamation mark with a question mark. Maybe a Timer switches between them to display uncertainty to the user or something. You’d expect there to be something like

text[5] = '?';

cause that’s how it works for all array types in c++. indexing the string at index 5 and replace ! with a ? char or juce::juce_wchar. unfortunately there is only an operator overloader of this operator that returns the w_char without a reference. you might think ok… then maybe String has something like vector::data(), where you get the internal array of characters. Sure it has.

auto ptr = text.getCharPointer();

but even then you can’t do:

ptr[5] = '?';

because indexing this “pointer” is not a thing apparently. it has a write method that asks for a juce_wchar but i don’t see what’s the point of that, because there is no argument for the actual index i wanna change. would it only change the first character then? weird. it seems like juce::String just refuses to be used for very simple things. it can only do hard things. and that’s weird because it does come with a function called

text.preallocateBytes();

where you can reserve a lot of memory like in a vector so that you can append and change stuff as much as you want as long as you don’t reach the predefined limit. but having this lowlevel-control would make way more sense if we could also modify individual characters.

i’m probably missing something entirely here and it’s actually just 1 or 2 lines of code to replace a character, but if that’s true it should be made more obvious for sure because i’m searching for this for some hours already now. it would be alright if there were alternative solutions to use with juce::Graphics.drawFittedText()’ first argument’s type. like a vector of juce_wchars would work perfectly for me if that was a valid input. or just a plain const char*

The use case you describe, replacing the exclamation point with a question mark, can just as easily be achieved with this:

juce::String str { "Hello!" };

str.replaceCharacter ('!', '?');

This has the advantage that the magic number 5 is no longer hard-coded.

You’re trying to use String as a low level container for individual chars, and that is not the intended use case of this class.

juce::String is not an array type. It’s not designed to be a low-level container.

For your use case, I would recommend either using a StringArray where each String is a single character long, or a vector/array of juce_wchar.

1 Like

ben. you were in the discord when we talked about that before, so you should know non-performant methods like replaceCharacter that automatically scan the whole string to replace a single character are very far from the solution that i need. a hard-coded 5 was just used as an example to simplify this problem, but there are many valid reasons to use numbers to index an array.

also even if String was designed to be a high-level object, it wouldn’t be too much to ask to let it have an operator overloader for replacing characters. i mean… why not? it’s useful, it’s easy and it’s probably already there, but only in some lowlevel context of String

I’m just pointing out that the detailed example you laid out for your feature request actually has a much better solution that’s already present in the String API.

If you want to replace a character at a single index without scanning the whole string, there is already a method for that too, replaceSection.

1 Like

String is not an array of juce_wchar, it is either UTF8, UTF16 or UTF32 controlled by a compile time constant. It is usually UTF8 unless you’ve changed it. Be very careful with the operator[], it is not a constant time operator like a simple array.

for (int i = 0; i < str.length(); i++) printf("%c", (char)str[i]);

Is actually an order n^2 operation!

You can’t be sure that you can just replace one character in a string, depending on its character code, it may be multiple bytes and then you need to shuffle the rest of the bytes along.

When dealing with Strings it’s better to just think of them being immutable and have two strings and swap between them.

3 Likes

replaceSection does not have juce_wchar or char or any char-type as an argument, but “StringRef”, which seems like another complicated thing that can hold longer strings and stuff. and then it replaces a whole section instead of just a single character… i mean… yeah it would work at all, but not performant

When dealing with UTF8 you have to scan the entire string. You can’t just assume you can change any byte.

2 Likes

ok in that case i should rephrase my question:

is there a juce object that does act like a string somehow, but it does not try to include all the different types of string formats but only you know… the most basic ones, where you can actually manage to make it work like an array? and if yes: can it be used with juce::Graphics’ text methods? and if no: is it impossible to make an object like that? if no: FR

If you want an array of chars, the best object to use is

juce::Array<juce_wchar>

or

std::vector<juce_wchar>

To interface with the graphics’ text drawing methods, you’d need a function that concatenates the container into a single string. Or, you could use StringArray, with each individual String being one character long, and then call its joinIntoString method.

1 Like

yeah that’s what i said in my initial post myself. i mean, that i would like to use vectors of chars. but the problem is that you have to convert them to juce::String in the end anyway to forward them to juce::Graphics. so that kinda defeats the purpose since i wanna gain performance here. as you know it’s for a very long “string” that can wildly change in content, but not in amount of chars

Right, so do this:

StringArray container;

container.ensureStorageAllocated (NUM_CHARS);

// to change one:
container.getReference (IDX) = "NEW_CHAR";

// to draw it:
graphics.drawText (container.joinIntoString (""));

ngl, it does seem slightly overkill since it’s still like storing an array of strings rather than just an array of chars, but i’ll try if that already improves stuff. not today anymore, cause it’s getting late here, but i’m looking forward to it. i hope joinIntoString has a clever implementation and is not just appending them one by one like i currently do

If you want, you can poke around in the String internals. If you are going to do this, make sure every ASCII value you are dealing with is < 127. Or if you have values greater than 127 you properly utf8 encode then and then move the rest of the string.

    auto str = juce::String("Hello!");
    auto ptr = str.begin().getAddress();
    ptr[5] = '?';
    DBG(str);
1 Like

that might actually solve my issue, as i don’t intend to use very unusual characters anyway. i should have mentioned that but i didn’t know the different characters of a string can have different sizes

edit: yes it did solve my issue! thx. let me rephrase my feature request then:

.begin().getAddress()

to become

.data()

and intellisense just tells the programmer that this function is dangerous if the ASCII values are >= 127 for the reasons you mentioned

In my opinion the last bit to add a .data() function makes total sense. Working with C++ always means to double check that what you are doing actually makes sense and does not crash your program. And I don’t feel like it would compromise the integrity of the juce::String idea. It stays a powerful high level container that does all the heavy lifting for you and ensures integrity as long as you don’t touch the raw data. It would be kind of like the AudioBuffer. Can do a lot of stuff right out of the box but you may access the raw data and do what you want yourself while giving up on the safety asserts or e.g. the clean-flag optimisation.

yeah, but that’s exactly how i use AudioBuffer, i get the raw float** out of it and then use channel- and sample-loop to iterate safely. the only real argument against modifying the string directly as an array is that arg with the different char-types being tricky to handle if special char requirements exist, but the argument for directly accessing it is that it has a better performance, especially if you managed to reserve the max amount of bytes you’ll ever need beforehand with .preallocateBytes(), similiar to how you prepare buffers in prepareToPlay, and then it’s really alright and safe to modify the string like that. when you buy a car it often comes with a button for turning off esp. when starting it up esp is turned on by default, but if you actively decide to rather manage some safety aspects manually in favor of some performance benefits you turn esp off. array-ish data structures giving me a .data() method in c++ gives me that exact vibe.

There is also UTF-32 which is a fixed width encoding. That means each character is four bytes!
Using that you can use indexing in linear time, but I don’t think it’s worth the overhead.

1 Like

if juce had different string classes for the different requirements of supported characters people who wanna work with indexes extensively could use that then. i think it’s worth to consider. it would probably not be a big rewrite as it’s essentially just copy pasting existing code around, and it would also not break existing projects, because it would be whole new string types

They could have went with a templated String, but instead the template is one layer below in the CharPointer. The default CharPointer is set with the method @RolandMR pointed out above.

So you can create any CharPointer at runtime, but the encoding for String is a once-for-all compile time decision.

templating the String class will have other implications, since either you template any function that requires the string or you do type erasing, which means a virtual overhead for every string operation. I think that performance impact can be quite severe over the whole code base.

2 Likes

yeah for sure! i would definitely not want to suggest any changes that make existing code slower than it is right now. and rewriting String to work with templates would also be quite an involved project considering how complex it already is. but i could imagine there to be an alternative string type that is just geared more towards accessing characters than all this high-level stuff that you don’t always need at all, and rarely need in its full feature-set