i’m not affiliated with google or this code and i haven’t tried it out locally (yet) but the runtime dispatch portion seems interesting:
Efficient and safe runtime dispatch is important. Modules such as image or video codecs are typically embedded into larger applications such as browsers, so they cannot require separate binaries for each CPU. Libraries also cannot predict whether the application already uses AVX2 (and pays the frequency throttling cost), so this decision must be left to the application. Using only the lowest-common denominator instructions sacrifices too much performance. Therefore, we provide code paths for multiple instruction sets and choose the most suitable at runtime. To reduce overhead, dispatch should be hoisted to higher layers instead of checking inside every low-level function. Generating each code path from the same source reduces implementation- and debugging cost.