Improving SHA-256's Processing Speed

I’m in need of a fairly speedy SHA-256 file hasher. JUCE’s implementation is rather naive, reading from a file 64 bytes at a time - which is totally reasonable for small data sets.

I’m working with a variety of large files so I had a serious crack at improving this situation.

Here are some benchmarks when testing with a 1 Gigabyte file, in a 64-bit release build on Windows 10, VS2019 (Laptop specs: Intel i7 6700HQ, 24GB RAM, NVMe drives)


1 min 8 secs
1 min 11 secs
1 min 14 secs
1 min 10 secs
1 min 8 secs

My Changes

11 secs
10 secs
9 secs
9 secs
9 secs

Here’s a test site to compare the resulting SHA of a file:

If you don’t have any known large data files handy to test with, you can go here to generate a file of whatever size, filled with random bytes: Online Random file generator

The changes are here. I left JUCE’s code intact if you want to toggle between my changes and the original code.

I’d love for people to give it a run and see if it’s reasonable/totally shit or whatever.

juce_SHA256.cpp (10.8 KB)

Some test code:

void testSHA256 (const juce::File& f)
    const juce::File logFile ("YourLogFileLocation");
    if (! logFile.existsAsFile())
        logFile.deleteRecursively (true);

    juce::FileOutputStream fos (logFile);
    jassert (fos.openedOk());

    jassert (f.existsAsFile());

    juce::FileInputStream fis (f);
    jassert (fis.openedOk());

    const auto ct = juce::Time::getCurrentTime();

    juce::SHA256 hash (fis);
    fos << hash.toHexString() << juce::newLine;

    const auto et = juce::Time::getCurrentTime();
    fos << (et - ct).getDescription() << juce::newLine;

I decided to take the concept I applied to SHA-256 into the Whirlpool implementation, where the process reads and stores more data from the stream at once to reduce the overall number of read operations.

Doing this cut the Whirlpool processing time for file streams for 1 GiB files to about a third of the time (ie: from ~1.5 minutes to ~30 seconds on my system), which I’d consider significant!

Here are the changes:

I’ve also improved the test so as to compare more easily. It should be noted that the MD5 implementation is rather speedy already, but I added it to the test just for demonstration purposes.

template<typename HashType>
void runHashTest (const String& name, FileOutputStream& logStream, const File& sourceDataFile)
    FileInputStream fis (sourceDataFile);
    jassert (fis.openedOk());

    logStream << name << ": ";

    const auto startTime = Time::getCurrentTime();
    HashType hash (fis);
    logStream << hash.toHexString();
    const auto endTime = Time::getCurrentTime();

    logStream << ", Execution Time: " << (endTime - startTime).getDescription() << newLine;

void testHashers (const File& sourceDataFile)
    jassert (sourceDataFile.existsAsFile());

    const auto logFile = File::getSpecialLocation (File::userDesktopDirectory).getChildFile ("hashTestLog.txt");
    if (! logFile.existsAsFile())
        logFile.deleteRecursively (true);

    FileOutputStream logStream (logFile);
    jassert (logStream.openedOk());

    logStream << "//==============================================================================" << newLine
              << "Test start: " << Time::getCurrentTime().toISO8601 (true) << newLine
              << "Test file: " << sourceDataFile.getFullPathName() << newLine
              << newLine;

    runHashTest<MD5> ("MD5", logStream, sourceDataFile);
    runHashTest<SHA256> ("SHA256", logStream, sourceDataFile);
    runHashTest<Whirlpool> ("Whirlpool", logStream, sourceDataFile);
1 Like

the issue with your test is that the second time you access the sourceDataFile, it’s probably in the os cache.

I’m aware! Though testing without that being an impediment has been on my mind for the last couple days.

I’m not sure I have an obvious answer so am open to suggestions!

Use different file of the same size for each test.

On OSX, use sudo purge in the command line
On Windows, I use an app called RAMMap

before launching your test

Interesting idea… Initially I was iterating through each large file I had at hand, but it’s hard to inspect if any of it has been cached/prefetched.

I’ll have a look at RAMMap. Thanks!

Comparing JUCE’s implementation vs my changes with 32 separate 1 GiB files to hash, I get the following:

SHA-256 (JUCE) Whirlpool (JUCE) SHA-256 (My Changes) Whirlpool (My Changes)
Total Duration (seconds): 2121 2862 333 995
Average Duration (seconds): 71.09375 96.6875 11.125 33.25

My data can be found here:

It’s probably worth noting that I reset my computer between tests because it looked like RAMMap didn’t change anything at all. I still ran it to clear everything out before running an x64 Release build… but I’m not convinced it did anything useful.


I’ve had measurably good success with my changes (and I realise there’s a pattern which points to DRYing). Can someone have a look? The whole forking or copy/pasting thing don’t tickle my fancy.

Have you compared the relative performance of your changes vs using a BufferedInputStream rather than a plain FileInputStream?

On my machine, your modified version takes around 4.5 seconds to hash a 1GB random file, but using a BufferedInputStream with a 1M buffer size takes 4.9 seconds, and the resulting code is a bit more readable/maintainable. Might using a BufferedInputStream be a viable approach for your use-case?


Not a damn clue why I didn’t think of using BufferedInputStream but no, I haven’t done such a test. Sounds promising though! I couldn’t say why I overcomplicated the changes.

If you’ve something I can try out, I’ll give it a run here?

I didn’t keep my changes around, but I think it should be something like:

juce::FileInputStream streamToHash { juce::File { "/path/to/file" } };
juce::BufferedInputStream bufferedStream { &streamToHash, 1 << 20, false };
juce::SHA256 hash { bufferedStream };
// We have the hash now, so we can print it, compare it etc.

On my desktop, with the BufferedInputStream approach applied internally to the hashing classes, the results happen amazingly fast still… I see what you mean by simplifying!

Have a gander and let me know if it’s still shite: needForSpeed.patch (2.8 KB)

Is this any faster/slower than doing it externally? I’d prefer not to force buffering in cases where it’s not required. In cases where we know we’re only going to be hashing very small items, it doesn’t make sense to force a 1M allocation.

I did try with various bits of text to be sure the results are consistent/nothing broke, though I can’t say I’ve seen perceptible changes in processing speed.

I’m on the fence; I can give it a try externally still… Is it much more work to manage on the user’s end - no, I suppose not… but this issue will still persist, where a user will encounter this slowdown when hashing non-text/large files.

Would you consider the alternative of adding the buffer to the juce::File based constructors instead of slapping it at the root? (eg: SHA256 (const File& file))

I’ll have a think about it, that may be a reasonable compromise. I definitely don’t think we should be doing this internally in the general case, though. Imagine that a user is already using a BufferedInputStream - we definitely wouldn’t want to double-up the buffering in that scenario.

Thanks! The scenario you mention is one that I was trying to imagine after I provided the patch… I don’t have a clear safety-net to suggest otherwise.