Std::hash specialisation for Identifier / unordered_map

chkn · November 16, 2017, 11:13am

Wouldn’t it be nice there is a std::hash specialisation for the juce-type Identifier, so it can be used as a key in std::unordered_map?

My naive implementation

namespace std {
	template <> struct hash<Identifier>
	{
		size_t operator()(const Identifier &x) const
		{
			// works, but slow
			return x.toString().hash();
			
			// wouldn't it be faster, to use the char-pointer which should be unique per Identifier-String???
			//return somemagic_cast<size_t>(x.getCharPointer());
		}
	};
}

chkn · November 16, 2017, 5:40pm

The pointer is not a good idea (because it can change), instead the hash could be precalculated in the pooled String Array. Then we have a ultra fast solution.

pflugshaupt · February 18, 2019, 2:02pm

Has anyone implemented this in the meantime the ultra fast way? With unordered_map finally available on all targets it would be very welcome.

pflugshaupt · May 3, 2019, 8:29am

Bump: If identifier could be hashed, large ValueTree structures (and any other structures using Identifiers) could use HashMap/unordered_map. Identifier based access and lookup would get a lot faster. So entire parts of the Juce library would benefit. Right now I’m forced to use std::map which is really slow for small sets.

adamwilson · July 29, 2019, 3:42pm

Bump!

I just hit this issue trying to use a juce::Identifier as a key for a std::unordered_map

For searchability the error is “the specified hash does not meet the Hash requirements”

pflugshaupt · July 31, 2019, 7:47am

+1 I wish I could vote for this again. I once tried to implement it myself, but it would need deep changes in the way Identifiers are created so I backed off.

tlongabaugh · July 31, 2019, 3:36pm

This would be killer.

holy-city · July 31, 2019, 4:30pm

Why? It’s just an interface wrapped around String. If you’re already working with C++17, this works (at least on MacOS, not sure if there’s weirdness with std::string_view on Windows):

#include <string_view>

namespace std {
    template<>
    class hash<juce::Identifier> {
    public:
        size_t operator()(const juce::Identifier& id) const {
            auto ptr = id.getCharPointer();
            auto len = StringRef(id).length();
            auto view = std::string_view (ptr, len);

            return std::hash<std::string_view>()(view);
        }
    };
}

basteln · July 31, 2019, 4:36pm

This has linear runtime in the length of the string, which is what the OP is trying to avoid.

dave96 · July 31, 2019, 4:47pm

It might be linear but presumably it’s still quite fast for most ID strings which are usually quite short?

Sure there might be quicker algorithms but what’s the use case?

holy-city · July 31, 2019, 4:49pm

The performance question shouldn’t be complexity but whether or not the hashing function takes longer than looking up a cached hash somewhere, or at least for what length string it comes into play.

pflugshaupt · July 31, 2019, 7:50pm

Using such a solution does work, but its performance would be too bad to be useful.

The deep changes I mean would be for a solution that calculates the hash once on app initialization. This would require changes to how identifiers work and are stored, but the benefit would be a big speed-up for Identifier trees that could become unordered.

A templated hash function IMHO is much too slow. Identifier is optimized for fast comparison based on the unique String pointer. This makes it suited for std::map and similar containers. If the hash needs to be recalculated all the time using a templated hash function, performance of unordered_map is going to be much worse than map. If the hash was cached, there was no question about which solution is faster ever.

Recalculating the hash all the time for Strings that are constant by design is just not a good solution.

holy-city · July 31, 2019, 8:35pm

Recalculating the hash all the time for Strings that are constant by design is just not a good solution.

Elegance aside, recomputing things is faster than looking them up, depending on how much you need to compute. Memory access is just about the slowest thing you can do on a processor.

But like I said, it depends on the length of the string, and I’m unconvinced that hashing small strings would be slower than looking up their hashes somewhere in cache. It would certainly be more deterministic. I’d like to see a benchmark to find out where the point of overlap is, if it’s like 8 byte strings before you see performance gains… sure let’s cache stuff. If it’s more like 128, then is it really necessary?

that calculates the hash once on app initialization.

This assumes that the strings used as keys are known at initialization and don’t change and none are added. Having a global cache of hashes introduces a lot of complexity w.r.t. thread synchronization.

reFX · July 31, 2019, 9:06pm

Would this help?

daniel · July 31, 2019, 9:22pm

…but the Identifier is already using the StringPool and a comparison for equality is a single operation:

github.com

WeAreROLI/JUCE/blob/master/modules/juce_core/text/juce_Identifier.h#L78


/** Creates a copy of another identifier. */

Identifier (Identifier&& other) noexcept;



/** Creates a copy of another identifier. */

Identifier& operator= (Identifier&& other) noexcept;



/** Destructor */

~Identifier() noexcept;



/** Compares two identifiers. This is a very fast operation. */

inline bool operator== (const Identifier& other) const noexcept     { return name.getCharPointer() == other.name.getCharPointer(); }



/** Compares two identifiers. This is a very fast operation. */

inline bool operator!= (const Identifier& other) const noexcept     { return name.getCharPointer() != other.name.getCharPointer(); }



/** Compares the identifier with a string. */

inline bool operator== (StringRef other) const noexcept             { return name == other; }



/** Compares the identifier with a string. */

inline bool operator!= (StringRef other) const noexcept             { return name != other; }

holy-city · July 31, 2019, 9:28pm

I thought the original issue with comparing the pointers was that the pointer could change?

pflugshaupt · July 31, 2019, 10:00pm

To recompute you obviously first need to look up the string which means memory access as well in addition to the computing.

Identifier Strings cannot change. As far as I remember they are put into an app-global list optimized for binary search (StringPool). To make the change, the existing structure could be used, it just would need an additional hash field stored with the Strings that doesn’t change even if string pointers change due to the structure growing.

Yes, but we want quick hashes for unordered map. Pointers as hashes won’t work because they can change if the StringPool outgrows its allocation.

basteln · August 1, 2019, 6:06am

Why not put the pre-calculated hash directly inside the Identifier class itself? It would increase sizeof(Identifier) by a size_t. But it would avoid the lookup, threading issues and main memory access.

Topic		Replies	Views
Request: Inject std::hash for Uuid type General JUCE discussion	7	1381	December 5, 2018
Add Identifier to default hashes and variant type General JUCE discussion	0	348	January 24, 2019
String annoyance General JUCE discussion	2	429	October 26, 2013
hash_map and juce::String General JUCE discussion	3	793	March 26, 2008
JUCE HashMap performance General JUCE discussion	4	1175	April 5, 2016

Std::hash specialisation for Identifier / unordered_map

Purchase

Discover

Learn

Support

About

Events

Std::hash specialisation for Identifier / unordered_map

Related topics

Purchase

Discover

Learn

Support

About

Events