Wouldn’t it be nice there is a std::hash specialisation for the juce-type Identifier, so it can be used as a key in std::unordered_map?
My naive implementation
namespace std {
template <> struct hash<Identifier>
{
size_t operator()(const Identifier &x) const
{
// works, but slow
return x.toString().hash();
// wouldn't it be faster, to use the char-pointer which should be unique per Identifier-String???
//return somemagic_cast<size_t>(x.getCharPointer());
}
};
}
The pointer is not a good idea (because it can change), instead the hash could be precalculated in the pooled String Array. Then we have a ultra fast solution.
Bump: If identifier could be hashed, large ValueTree structures (and any other structures using Identifiers) could use HashMap/unordered_map. Identifier based access and lookup would get a lot faster. So entire parts of the Juce library would benefit. Right now I’m forced to use std::map which is really slow for small sets.
+1 I wish I could vote for this again. I once tried to implement it myself, but it would need deep changes in the way Identifiers are created so I backed off.
Why? It’s just an interface wrapped around String. If you’re already working with C++17, this works (at least on MacOS, not sure if there’s weirdness with std::string_view on Windows):
#include <string_view>
namespace std {
template<>
class hash<juce::Identifier> {
public:
size_t operator()(const juce::Identifier& id) const {
auto ptr = id.getCharPointer();
auto len = StringRef(id).length();
auto view = std::string_view (ptr, len);
return std::hash<std::string_view>()(view);
}
};
}
The performance question shouldn’t be complexity but whether or not the hashing function takes longer than looking up a cached hash somewhere, or at least for what length string it comes into play.
Using such a solution does work, but its performance would be too bad to be useful.
The deep changes I mean would be for a solution that calculates the hash once on app initialization. This would require changes to how identifiers work and are stored, but the benefit would be a big speed-up for Identifier trees that could become unordered.
A templated hash function IMHO is much too slow. Identifier is optimized for fast comparison based on the unique String pointer. This makes it suited for std::map and similar containers. If the hash needs to be recalculated all the time using a templated hash function, performance of unordered_map is going to be much worse than map. If the hash was cached, there was no question about which solution is faster ever.
Recalculating the hash all the time for Strings that are constant by design is just not a good solution.
Recalculating the hash all the time for Strings that are constant by design is just not a good solution.
Elegance aside, recomputing things is faster than looking them up, depending on how much you need to compute. Memory access is just about the slowest thing you can do on a processor.
But like I said, it depends on the length of the string, and I’m unconvinced that hashing small strings would be slower than looking up their hashes somewhere in cache. It would certainly be more deterministic. I’d like to see a benchmark to find out where the point of overlap is, if it’s like 8 byte strings before you see performance gains… sure let’s cache stuff. If it’s more like 128, then is it really necessary?
that calculates the hash once on app initialization.
This assumes that the strings used as keys are known at initialization and don’t change and none are added. Having a global cache of hashes introduces a lot of complexity w.r.t. thread synchronization.
To recompute you obviously first need to look up the string which means memory access as well in addition to the computing.
Identifier Strings cannot change. As far as I remember they are put into an app-global list optimized for binary search (StringPool). To make the change, the existing structure could be used, it just would need an additional hash field stored with the Strings that doesn’t change even if string pointers change due to the structure growing.
Yes, but we want quick hashes for unordered map. Pointers as hashes won’t work because they can change if the StringPool outgrows its allocation.
Why not put the pre-calculated hash directly inside the Identifier class itself? It would increase sizeof(Identifier) by a size_t. But it would avoid the lookup, threading issues and main memory access.