Google highway simd library

i’m not affiliated with google or this code and i haven’t tried it out locally (yet) but the runtime dispatch portion seems interesting:

Efficient and safe runtime dispatch is important. Modules such as image or video codecs are typically embedded into larger applications such as browsers, so they cannot require separate binaries for each CPU. Libraries also cannot predict whether the application already uses AVX2 (and pays the frequency throttling cost), so this decision must be left to the application. Using only the lowest-common denominator instructions sacrifices too much performance. Therefore, we provide code paths for multiple instruction sets and choose the most suitable at runtime. To reduce overhead, dispatch should be hoisted to higher layers instead of checking inside every low-level function. Generating each code path from the same source reduces implementation- and debugging cost.

1 Like

So what techniques are used to ID the bottlenecks? Back in the day Vtune would take you there for IA platforms in which case you could optimize away and develop code paths accordingly. This was a ghastly investment of dev resources (I know, been there, have the t-shirt as I schlepped the SIMD/Vtune story for Intel for the first decade of my career). IA platforms long since became “fast enough” to not merit screwing around with the overhead. This is probably not as true for non-desktop targets these days. Hence the question- how are you determining the bottlenecks?

Is anybody using Highway these days? It appears quite mature and up-to-date but I’m getting confused about the statements about using it with Visual Studio. Dynamic dispatch can’t fully work with the msvc compiler because of vex vs non-vex, but clang-cl for msvc should maybe be able to do the trick, but so far I’ve only managed to get into trouble.
Are there any success stories of people shipping a plugin using highway dynamic dispatch on windows?

Update: I got things to work with some help of the creator of the library himself - I was just doing some stuff incorrectly with lambdas and templates. So it is possible to do full dynamic dispatch for various SIMD types using google highway for macOS and Windows if clang-cl is used to compile the windows version.