Iterate through a JUCE String?


#1

How do you iterate through a JUCE String?

can you do:

String mystring = "something";
for (const auto& c : mystring) {}

or use some kind of “++iter” type syntax?

auto iter = String.being()

I know there’s no begin function, I just need a hint at how to iterate, whatever is the best practice.


#2

Short & naive answer:

for (auto p = mystring.getCharPointer(); !p->isEmpty(); ++p)
{
    // do something with (*p)
}

The long answer: That depends on what you want to do with characters:

  • Encode a string: iterate over the Unicode characters, as in the loop above
  • Clip a string to a given length for your UI: iterate over Grapheme Clusters (1). Actually there’s other hairy things like right-to-left text, to be honest I have no idea how to do that correctly for arbitrary Unicode strings.
  • Convert to upper case, compare, sort, etc.: Don’t do it. Use a library like the International Components for Unicode for that.

(1) The most common examples for users of the Latin alphabet are the new Unicode flags, and the ethnic variants of smilies, and combining diacritics: 🇺🇰, 🙋🏻, 🙋🏿 and are all 2-character strings, on systems which support it that will render to resp. the UK flag, a white guy and a brown guy, and a letter e with an accent.


#3

Possibly not the most elegant…

for (String::CharPointerType character (mystring.getCharPointer()); character != character.findTerminatingNull(); ++character)
{
    DBG (String (character, 1));
}

To save a few characters you could swap out ‘String::CharPointerType’ for the auto keyword


#5

I’d definitely ask “why” before doing this - I’ve used strings in many thousands of situations and very very rarely need to actually go in and hit the characters directly. Maybe you want to tokenise it into a StringArray or something?


#6

for example, to read midi names and convert to number… hmm just noticed the switch case is not needed, it was needed before but I’ve been slowly improving the code. Edit: Oops, nope it is needed or I can’t skip the “get accidental” or “get sign” case.

int MIDI::num(String s)
{
	if (s.isEmpty()) ERROR(e::empty_string);

	int midi_val;

	if (STR::isInt(s))
	{
		midi_val = s.getIntValue();
		if (midi_val < 0 || midi_val > 127) ERROR(e::num_out_of_bounds);
		return midi_val;
	}

	s = s.toUpperCase();
	octave_str.clear();
	accidental_mod = 0;
	accidental = '\0';
	sign = '\0';

	enum MODE { get_letter, get_accidental, get_sign, get_octave};

	MODE mode = get_letter;

	for (auto p = s.getCharPointer(); !p.isEmpty();)
	{
		switch (mode)
		{
		case get_letter:
			if (!CHAR::isAnyOf(*p, "ABCDEFG")) ERROR(e::not_a_midi_letter);
			letter = *p;
			letter_mod = letter_to_value[letter];
			mode = get_accidental;
			++p;
			continue;
		case get_accidental:
			if (!CHAR::isAnyOf(*p, "B#")) { mode = get_sign; continue; }
			accidental = *p;
			accidental_mod = accidental_to_value[accidental];
			mode = get_sign;
			++p;
			continue;
		case get_sign:
			if (!CHAR::isAnyOf(*p, "-+")) { mode = get_octave; continue; }
			sign = *p;
			if (sign == '-') sign_mod = -1;
			++p;
			mode = get_octave;
			continue;
		case get_octave:
			for (; !p.isEmpty(); ++p) octave_str += *p;
			if (!STR::isInt(octave_str)) ERROR(e::cannot_read_octave);
			octave = octave_str.getIntValue();
			midi_val = (octave * sign_mod + lowest_octave*-1) * 12 + letter_mod + accidental_mod;
			if (midi_val < 0 || midi_val > 127) ERROR(e::num_out_of_bounds);
			continue;
		}
	}

	return midi_val;
}

#7

Might take a look at: MidiMessage::keyNameToNumber


#8

I like the idea of going through each character rather than scanning through string (starting from the beginning) each step of the way to see if it contains this or that element at whatever location.

I’m having a heck of a time trying to figure out how to make juce_wchar and String and StringRef and CharPointer_UTF8 work together.

Should I just use string for EVERYTHING even one-character strings rather than juce_wchar or CharPointer_UTF8?


#9

Jules - if i want to go through the characters to work out if something is likely to be a valid UUID that’d be a pretty reasonable reason for iterating through the characters? I can’t think of a more straightforward way to do it… 100% ears if there’s a better solution.


#10

You could tokenise and manually check if the tokens match a pattern, or use a regex.


#11

Regex would have been the obvious idea actually. I think turning it into tokens makes for a pretty inefficient algorithm, with a few memory allocations?


#12

Depends what you’ve available, I guess!


#13

If you compare the same strings over and over, have a look at Identifier.
It stores a hash together with the string, so you can check fast for equality.


#14
inline bool isPossibleUuid(const String & string)
{
	auto numHexDigits = 0;
	auto dashes = 0;
	auto pos = 0;

	for (auto ptr = string.getCharPointer(); !ptr.isEmpty(); ++ptr)
	{
		auto c = *ptr;

		if (iswxdigit(c))
			numHexDigits++;
		else if (c == '-' && (pos == 8 || pos == 13 || pos == 18 || pos == 23))
			dashes++;
		else
			return false;

		pos++;
	}

	return (dashes == 4 || dashes == 0) && numHexDigits == 32;
}

Is what I’m doing…the regex option sound clearer. Though this tests ok, so unless there’s some peril I’ve not spotted I’m going with it.

(I’m not sure an Identifier helps here. And I’m not sure that the Identifier stores a hash either? It stores a string but provides an equality operator that checks the strings point to the same actual data, rather than comparing the content of the strings. There must be some reason why using identifier requires the strings to reference the same memory … presumably something to do with the internals of String, and restrictions on the use of String as a result of Identifier … but I’m not sure I know what the mechanism is…?)


#15

Ouch, sorry. That’s very embarrasing. I hesitated a second when hitting the send button, if I shouldn’t rather look into the source of Identifier first. And of course I have no idea where I thought that I had read the hash thing.
Forget everything I wrote :wink: