MD5 output different from common generators

Tsury · September 18, 2011, 9:12pm

Hi.

I know there’s already a thread on that matter, but its old and juce MD5 has been changed since then so I don’t want to create confusion.
For reference, its http://www.rawmaterialsoftware.com/viewtopic.php?f=2&t=5079

Today I tried using MD5 on string as simple as “dog”, “cat” etc, and the result of MD5(“dog”).toHexString() is different from common MD5 generators.

MD5(“dog”).toHexString(); = 4dc9a19944bad6ba904b6f5189d6dd0e
http://www.md5.net for “dog” = 06d80eb0c50b49a509b49f2424e8c805

Is there some other internal conversion needed, rather than UTF8?

It seems like there’s a very defined and standardized algorithm for it: http://en.wikipedia.org/wiki/MD5#Algorithm

jules · September 19, 2011, 8:11am

Sigh… I knew I should have just removed the constructor that takes a string.

Look, I think the other thread explains the situation perfectly, but here are the key facts again:

There are an infinite number of ways to turn a string into an MD5. There is no ‘right’ or ‘wrong’ way to do it.
The juce MD5 string constructor uses utf-32, not utf-8, so obviously if you compare it to something that used utf-8 it will be different.
Yes, I should probably have used utf-8.
No, I can’t change it now, because that’d break everybody’s existing code.
It’s easy to get the MD5 of a utf-8 string: just give it a pointer to your utf-8 data, rather than using the constructor that takes a string.

Tsury · September 19, 2011, 3:04pm

[quote=“jules”]Sigh… I knew I should have just removed the constructor that takes a string.

Look, I think the other thread explains the situation perfectly, but here are the key facts again:

There are an infinite number of ways to turn a string into an MD5. There is no ‘right’ or ‘wrong’ way to do it.
The juce MD5 string constructor uses utf-32, not utf-8, so obviously if you compare it to something that used utf-8 it will be different.
Yes, I should probably have used utf-8.
No, I can’t change it now, because that’d break everybody’s existing code.
It’s easy to get the MD5 of a utf-8 string: just give it a pointer to your utf-8 data, rather than using the constructor that takes a string.[/quote]

I can understand when you say it is too late to change it (we don’t want to break other projects), but I disagree about the 'infinite number of ways; it might be infinite, but still, I see MD5 checksum used between programs and across websites without any problems.
If there are infinite options, the most logical one to choose is the most supported/popular one (obviously there is such an option)…

The minimum you can do, imo, is remove that String CTor (or maybe change its implementation to use the char* one)…

jules · September 19, 2011, 3:42pm

Ok, I’m exaggerating with ‘infinite’, but there are [number of possible string encodings] * [number of possible byte-orderings] * [extra formatting options, e.g. length, zero-terminator, etc]. That’s a large number.

I’ve already done that, it’ll be in my next check-in. What I’ve done is to move the old constructor to make it more explicit, and to add a constructor that takes a CharPointer_UTF8 so it’s clear what’s actually happening.

Tsury · September 19, 2011, 3:48pm

Got it… Thanks!

lelepar · December 7, 2011, 3:25pm

Please, could someone provide some snippet code about this topic?

I’m using this method, but result is different from common generators when the plain string contains multibyte characters.

String computeMd5(String plain){
	MD5 md5String(plain.toUTF8(), plain.length());
	return md5String.toHexString();
};

Cheers
Emanuele

jules · December 7, 2011, 4:34pm

String::length() returns the number of characters in the string, NOT the number of bytes in its utf-8 encoding.

X-Ryl669 · December 8, 2011, 10:26am

WTF ?
Since when does it do that ?
One of the most important advantage for using a string class is to avoid computing the string length (I mean the memory consumption) for each string operation.
OMG!! I probably have bad code around assuming string.length() == stringBuffer.memorySize().

Can you add a getRequiredBytesForUTF8() method to the string class and change the toUTF8() signature to read toUTF8(const int requiredBytes = 0), so we avoid doing a useless strlen() each time we convert to UTF8 ?

jules · December 8, 2011, 11:28am

It has always done that! Since the internal format used by the string may change (and has changed in the past, from UTF32 to UTF8), it wouldn’t make any sense at all for length() to return anything other than the number of characters.

And also note the fact that toUTF8 does nothing except to return the string’s CharPointer_UTF8 object, which already provides methods you can call to get things like the byte size. There would be no point in me adding any new methods to String to do that, since they’re already available in that class.

lelepar · December 14, 2011, 7:45am

So:

String computeMd5(String plain){
   MD5 md5String(plain.toUTF8(), plain.toUTF8().sizeInBytes());
   return md5String.toHexString();
};

should do the trick…

jules · December 14, 2011, 8:49am

[quote=“lelepar”]So:

String computeMd5(String plain){
   MD5 md5String(plain.toUTF8(), plain.toUTF8().sizeInBytes());
   return md5String.toHexString();
};

should do the trick…[/quote]

Yes, as long as you want your checksum to also include the string’s terminating zero. I don’t know if that’s how MD5s are commonly calculated or not.

X-Ryl669 · December 14, 2011, 10:53am

No there are not (hence the other thread, with the exact same issue with SHA256).
Anyway, the code should read

    String computeMd5(String plain){
       MD5 md5String(plain.toUTF8(), strlen(plain.toUTF8()));
       return md5String.toHexString();
    };

jules · December 14, 2011, 11:05am

…or just

String computeMd5 (const String& plain) { return MD5 (plain.toUTF8()).toHexString(); }

(assuming you’re using the latest modules branch, where there’s an MD5 constructor that takes a CharPointer_UTF8)

Topic		Replies	Views
MD5 results differ from other MD5 generators General JUCE discussion	5	1278	February 15, 2010
Struggling with the MD5 class between old and new version of Juce General JUCE discussion	4	512	April 23, 2014
MD5 String constructor? General JUCE discussion	3	288	February 23, 2012
MD5 issue MacOSX and iOS	6	430	January 15, 2009
Latest MD5 broken? General JUCE discussion	1	258	March 12, 2010

MD5 output different from common generators

Purchase

Discover

Learn

Support

About

Events

MD5 output different from common generators

Related Topics

Purchase

Discover

Learn

Support

About

Events