Std::hash specialisation for Identifier / unordered_map

Wouldn’t it be nice there is a std::hash specialisation for the juce-type Identifier, so it can be used as a key in std::unordered_map?

My naive implementation

namespace std {
	template <> struct hash<Identifier>
	{
		size_t operator()(const Identifier &x) const
		{
			// works, but slow
			return x.toString().hash();
			
			// wouldn't it be faster, to use the char-pointer which should be unique per Identifier-String???
			//return somemagic_cast<size_t>(x.getCharPointer());
		}
	};
}

The pointer is not a good idea (because it can change), instead the hash could be precalculated in the pooled String Array. Then we have a ultra fast solution.

3 Likes

Has anyone implemented this in the meantime the ultra fast way? With unordered_map finally available on all targets it would be very welcome.

1 Like

Bump: If identifier could be hashed, large ValueTree structures (and any other structures using Identifiers) could use HashMap/unordered_map. Identifier based access and lookup would get a lot faster. So entire parts of the Juce library would benefit. Right now I’m forced to use std::map which is really slow for small sets.

3 Likes

Bump!

I just hit this issue trying to use a juce::Identifier as a key for a std::unordered_map

For searchability the error is “the specified hash does not meet the Hash requirements”

1 Like

+1 I wish I could vote for this again. I once tried to implement it myself, but it would need deep changes in the way Identifiers are created so I backed off.

This would be killer.

Why? It’s just an interface wrapped around String. If you’re already working with C++17, this works (at least on MacOS, not sure if there’s weirdness with std::string_view on Windows):

#include <string_view>

namespace std {
    template<>
    class hash<juce::Identifier> {
    public:
        size_t operator()(const juce::Identifier& id) const {
            auto ptr = id.getCharPointer();
            auto len = StringRef(id).length();
            auto view = std::string_view (ptr, len);

            return std::hash<std::string_view>()(view);
        }
    };
}

This has linear runtime in the length of the string, which is what the OP is trying to avoid.

It might be linear but presumably it’s still quite fast for most ID strings which are usually quite short?

Sure there might be quicker algorithms but what’s the use case?

The performance question shouldn’t be complexity but whether or not the hashing function takes longer than looking up a cached hash somewhere, or at least for what length string it comes into play.

Using such a solution does work, but its performance would be too bad to be useful.

The deep changes I mean would be for a solution that calculates the hash once on app initialization. This would require changes to how identifiers work and are stored, but the benefit would be a big speed-up for Identifier trees that could become unordered.

A templated hash function IMHO is much too slow. Identifier is optimized for fast comparison based on the unique String pointer. This makes it suited for std::map and similar containers. If the hash needs to be recalculated all the time using a templated hash function, performance of unordered_map is going to be much worse than map. If the hash was cached, there was no question about which solution is faster ever.

Recalculating the hash all the time for Strings that are constant by design is just not a good solution.

Recalculating the hash all the time for Strings that are constant by design is just not a good solution.

Elegance aside, recomputing things is faster than looking them up, depending on how much you need to compute. Memory access is just about the slowest thing you can do on a processor.

But like I said, it depends on the length of the string, and I’m unconvinced that hashing small strings would be slower than looking up their hashes somewhere in cache. It would certainly be more deterministic. I’d like to see a benchmark to find out where the point of overlap is, if it’s like 8 byte strings before you see performance gains… sure let’s cache stuff. If it’s more like 128, then is it really necessary?

that calculates the hash once on app initialization.

This assumes that the strings used as keys are known at initialization and don’t change and none are added. Having a global cache of hashes introduces a lot of complexity w.r.t. thread synchronization.

Would this help?

1 Like

…but the Identifier is already using the StringPool and a comparison for equality is a single operation:

I thought the original issue with comparing the pointers was that the pointer could change?

To recompute you obviously first need to look up the string which means memory access as well in addition to the computing.

Identifier Strings cannot change. As far as I remember they are put into an app-global list optimized for binary search (StringPool). To make the change, the existing structure could be used, it just would need an additional hash field stored with the Strings that doesn’t change even if string pointers change due to the structure growing.

Yes, but we want quick hashes for unordered map. Pointers as hashes won’t work because they can change if the StringPool outgrows its allocation.

1 Like

Why not put the pre-calculated hash directly inside the Identifier class itself? It would increase sizeof(Identifier) by a size_t. But it would avoid the lookup, threading issues and main memory access.

5 Likes