String.compareIgnoreNewlines


#1

If you write a string to a file using File.replaceWithText and then compare the result from File.loadFileAsString with your original string they may not be equal due to newline differences. How about adding a compareIgnoreNewlines method to String and CharacterFunctions to get around this?

[code] bool isNewline (const juce_wchar character)
{
const char c = (char) character;
return (c <= 13 && c >= 10);
}

int compareIgnoreNewlines(const juce_wchar* s1, const juce_wchar* s2) {
	for (;;)
	{
		while (isNewline(*s1))
			++s1;
		while (isNewline(*s2))
			++s2;

		if (*s1 != *s2)
		{
			const int diff = *s1 - *s2;

			if (diff != 0)
				return diff < 0 ? -1 : 1;
		}
		else if (*s1 == 0)
			break;

		++s1;
		++s2;
	}

    return 0;
}[/code]

Cheers,
Caleb


#2

In this you’re skipping the newlines altogether, but wouldn’t it be better to fail if there’s a newline in one string that isn’t in the other? That way it’d basically be a normal comparison, but one in which \n, \r, and \r\n are all considered to be equivalent.


#3

Yeah I suppose that would be better. Here’s the code then:

[code] bool skipIfNewline(const juce_wchar** str) {
if (**str == ‘\n’)
{
(*str)++;
return true;
}
if (**str == ‘\r’)
{
if ((*str)[1] == ‘\n’)
{
(*str)+=2;
return true;
}
(*str)+=1;
return true;
}
return false;
}

bool equalsIgnoreNewlines(const juce_wchar* s1, const juce_wchar* s2) {
	for (;;)
	{
		if (*s1 != *s2)
		{
			if (skipIfNewline(&s1) && skipIfNewline(&s2))
				continue;

			return false;
		}
		else if (*s1 == 0)
			break;

		++s1;
		++s2;
	}

    return true;
}[/code]

#4

It’d feel wrong to add something so specific to the library, though. Maybe a more generic method would be a comparison that lets you supply a set of characters that are considered equivalent? Or one where you supply the character comparison function…

Is it because of performance reasons that you’re not just writing string1.replace (T("\r\n"), T("\n")).replaceCharacter (T(’\r’), T(’\n’)) == string2.replace (T("\r\n"), T("\n")).replaceCharacter (T(’\r’), T(’\n’))


#5

Ok, that makes sense. I just thought it was a bit odd that when you write a string to a file that it modifies it. (why can’t we all just agree on one newline character?)

It’s not so much a performance issue as a “could be better” one… I don’t like writing code that could be faster :wink:


#6

Personally I like the “\r\n” string (don’t hit me) because one means a new line, the other means move to beginning of the line, as it should be. I may just want to move down a line without resetting to the beginning… Old printer codes, so sue me…


#7