Iterate through a JUCE String?

Elan_Hickler · June 1, 2016, 9:35pm

How do you iterate through a JUCE String?

can you do:

String mystring = "something";
for (const auto& c : mystring) {}

or use some kind of “++iter” type syntax?

auto iter = String.being()

I know there’s no begin function, I just need a hint at how to iterate, whatever is the best practice.

roeland-2 · June 1, 2016, 10:40pm

Short & naive answer:

for (auto p = mystring.getCharPointer(); !p->isEmpty(); ++p)
{
    // do something with (*p)
}

The long answer: That depends on what you want to do with characters:

Encode a string: iterate over the Unicode characters, as in the loop above
Clip a string to a given length for your UI: iterate over Grapheme Clusters (1). Actually there’s other hairy things like right-to-left text, to be honest I have no idea how to do that correctly for arbitrary Unicode strings.
Convert to upper case, compare, sort, etc.: Don’t do it. Use a library like the International Components for Unicode for that.

(1) The most common examples for users of the Latin alphabet are the new Unicode flags, and the ethnic variants of smilies, and combining diacritics: 🇺🇰, 🙋🏻, 🙋🏿 and é are all 2-character strings, on systems which support it that will render to resp. the UK flag, a white guy and a brown guy, and a letter e with an accent.

anthony-nicholls · June 1, 2016, 10:42pm

Possibly not the most elegant…

for (String::CharPointerType character (mystring.getCharPointer()); character != character.findTerminatingNull(); ++character)
{
    DBG (String (character, 1));
}

To save a few characters you could swap out ‘String::CharPointerType’ for the auto keyword

jules · June 2, 2016, 7:12am

I’d definitely ask “why” before doing this - I’ve used strings in many thousands of situations and very very rarely need to actually go in and hit the characters directly. Maybe you want to tokenise it into a StringArray or something?

Elan_Hickler · June 2, 2016, 4:11pm

for example, to read midi names and convert to number… hmm just noticed the switch case is not needed, it was needed before but I’ve been slowly improving the code. Edit: Oops, nope it is needed or I can’t skip the “get accidental” or “get sign” case.

int MIDI::num(String s)
{
	if (s.isEmpty()) ERROR(e::empty_string);

	int midi_val;

	if (STR::isInt(s))
	{
		midi_val = s.getIntValue();
		if (midi_val < 0 || midi_val > 127) ERROR(e::num_out_of_bounds);
		return midi_val;
	}

	s = s.toUpperCase();
	octave_str.clear();
	accidental_mod = 0;
	accidental = '\0';
	sign = '\0';

	enum MODE { get_letter, get_accidental, get_sign, get_octave};

	MODE mode = get_letter;

	for (auto p = s.getCharPointer(); !p.isEmpty();)
	{
		switch (mode)
		{
		case get_letter:
			if (!CHAR::isAnyOf(*p, "ABCDEFG")) ERROR(e::not_a_midi_letter);
			letter = *p;
			letter_mod = letter_to_value[letter];
			mode = get_accidental;
			++p;
			continue;
		case get_accidental:
			if (!CHAR::isAnyOf(*p, "B#")) { mode = get_sign; continue; }
			accidental = *p;
			accidental_mod = accidental_to_value[accidental];
			mode = get_sign;
			++p;
			continue;
		case get_sign:
			if (!CHAR::isAnyOf(*p, "-+")) { mode = get_octave; continue; }
			sign = *p;
			if (sign == '-') sign_mod = -1;
			++p;
			mode = get_octave;
			continue;
		case get_octave:
			for (; !p.isEmpty(); ++p) octave_str += *p;
			if (!STR::isInt(octave_str)) ERROR(e::cannot_read_octave);
			octave = octave_str.getIntValue();
			midi_val = (octave * sign_mod + lowest_octave*-1) * 12 + letter_mod + accidental_mod;
			if (midi_val < 0 || midi_val > 127) ERROR(e::num_out_of_bounds);
			continue;
		}
	}

	return midi_val;
}

oxxyyd · June 3, 2016, 1:20am

Might take a look at: MidiMessage::keyNameToNumber

Elan_Hickler · June 3, 2016, 9:06pm

I like the idea of going through each character rather than scanning through string (starting from the beginning) each step of the way to see if it contains this or that element at whatever location.

I’m having a heck of a time trying to figure out how to make juce_wchar and String and StringRef and CharPointer_UTF8 work together.

Should I just use string for EVERYTHING even one-character strings rather than juce_wchar or CharPointer_UTF8?

jimc · December 12, 2016, 8:15pm

Jules - if i want to go through the characters to work out if something is likely to be a valid UUID that’d be a pretty reasonable reason for iterating through the characters? I can’t think of a more straightforward way to do it… 100% ears if there’s a better solution.

jrlanglois · December 12, 2016, 8:30pm

You could tokenise and manually check if the tokens match a pattern, or use a regex.

jimc · December 12, 2016, 8:32pm

Regex would have been the obvious idea actually. I think turning it into tokens makes for a pretty inefficient algorithm, with a few memory allocations?

jrlanglois · December 12, 2016, 9:41pm

Depends what you’ve available, I guess!

daniel · December 13, 2016, 12:07pm

If you compare the same strings over and over, have a look at Identifier.
It stores a hash together with the string, so you can check fast for equality.

jimc · December 13, 2016, 2:52pm

inline bool isPossibleUuid(const String & string)
{
	auto numHexDigits = 0;
	auto dashes = 0;
	auto pos = 0;

	for (auto ptr = string.getCharPointer(); !ptr.isEmpty(); ++ptr)
	{
		auto c = *ptr;

		if (iswxdigit(c))
			numHexDigits++;
		else if (c == '-' && (pos == 8 || pos == 13 || pos == 18 || pos == 23))
			dashes++;
		else
			return false;

		pos++;
	}

	return (dashes == 4 || dashes == 0) && numHexDigits == 32;
}

Is what I’m doing…the regex option sound clearer. Though this tests ok, so unless there’s some peril I’ve not spotted I’m going with it.

(I’m not sure an Identifier helps here. And I’m not sure that the Identifier stores a hash either? It stores a string but provides an equality operator that checks the strings point to the same actual data, rather than comparing the content of the strings. There must be some reason why using identifier requires the strings to reference the same memory … presumably something to do with the internals of String, and restrictions on the use of String as a result of Identifier … but I’m not sure I know what the mechanism is…?)

daniel · December 13, 2016, 3:31pm

Ouch, sorry. That’s very embarrasing. I hesitated a second when hitting the send button, if I shouldn’t rather look into the source of Identifier first. And of course I have no idea where I thought that I had read the hash thing.
Forget everything I wrote

reuk · April 15, 2020, 11:11am

String begin and end will be supported in JUCE 6

Topic		Replies	Views
String to char* General JUCE discussion	35	3899	May 12, 2017
Juce::String to const char*? General JUCE discussion	29	9954	February 2, 2020
Things that are confusing about String Feature Requests	24	2116	January 31, 2022
Creating JUCE String just to use String comparison functions bad idea? General JUCE discussion	2	1308	June 3, 2016
Convert Juce::String to std::string and vice versa General JUCE discussion	4	6860	May 12, 2017

Iterate through a JUCE String?

Purchase

Discover

Learn

Support

About

Events

Iterate through a JUCE String?

Related topics

Purchase

Discover

Learn

Support

About

Events