MD5 results differ from other MD5 generators

friscokid · February 14, 2010, 2:17pm

Hi,
when I generate an MD5 checksum from a String, the value returned by toHexString() differs from the values that other MD5 generators calculate.
I checked these three generators:
http://www.adamek.biz/md5-generator.php
http://www.md5generator.com/index.php
http://files.kniebes.net/php/md5/
Their results are consistent.
The juce::MD5 returns a 32 digit hex string that differs from the other results (which are 32 digit hex strings, too).

Is this a bug or am I missing something?

jules · February 14, 2010, 10:47pm

It could be a character encoding difference - juce treats the strings as wide-character unicode. These other ones might be doing it as utf8, or ascii, or who knows what… Maybe try passing your string as utf8 to the md5 class rather than as a string.

friscokid · February 15, 2010, 12:21pm

Ok, I compared the result of the juce::MD5 to the result of the PHP MD5 and the MySQL MD5 functions and yield the following results:
(Let’s call the two different resulting strings ‘A’ and ‘B’, which represent the 32 digit hex strings.)

PHP MD5: A
MySQL MD5: A

MD5(myString).toHexString(): B
MD5(myString.toUTF8()).toHexString(): B
MD5(myString).toHexString().toUTF8(): B
MD5(myString.toUTF8()).toHexString().toUTF8(): B
MD5((const char*) myString, myString.length()).toHexString(): A

So the only way to get the standard MD5 result with juce MD5 class is to call it with:
MD5((const char*) myString, myString.length())

But I expected that
MD5 (const String &text)
and
MD5 (const char *data, const int numBytes)
give the same result, if I call them both using the same String object as source data.
But obviously this isn’t the case, which is really confusing.

What do you think?

jules · February 15, 2010, 12:37pm

[quote]But I expected that
MD5 (const String &text)
and
MD5 (const char *data, const int numBytes)
give the same result, if I call them both using the same String object as source data.[/quote]

Why would you assume that? An MD5 is calculated from raw data, and there are many ways to turn a string into raw data… I decided to do it by treating the string as a series of wide chars, and these others seem to either be using utf8 or ascii or something. It might be the case that they produce different values from each other if you feed them a string containing multi-byte characters, as they might not all be using the same encoding.

Is there a standard for this? If so I’d be happy to change my code to match it!

friscokid · February 15, 2010, 1:55pm

Good point, there is no standard way for doing this.
So, the best way seems to find out what kind of data the target to compare with uses and then chose the appropriate data to call the juce::MD5 with.
Thanks

jules · February 15, 2010, 3:16pm

TBH looking at my code, converting the string to UTF8 would have been a much neater way to do it, (though I might actually have written the md5 code before I had a UTF8 converter). I’d change it to work that way, although that risks breaking people’s code that already uses the old method…

Topic		Replies	Views
MD5 output different from common generators General JUCE discussion	12	674	December 14, 2011
MD5 issue MacOSX and iOS	6	479	January 15, 2009
Struggling with the MD5 class between old and new version of Juce General JUCE discussion	4	564	April 23, 2014
MD5 String constructor? General JUCE discussion	3	325	February 23, 2012
Latest MD5 broken? General JUCE discussion	1	281	March 12, 2010

MD5 results differ from other MD5 generators

Purchase

Discover

Learn

Support

About

Events

MD5 results differ from other MD5 generators

Related topics

Purchase

Discover

Learn

Support

About

Events