I need to compute some lookup tables that hold values for an FFT-based correlation, executed on the GPU. To boost some efficiency, the algorithm expects my pre-computed values to have a bit reversed order in memory, so I need to shuffle a lot of arrays this way. Until now the values were precomputed with Matlab which has the `bitrevorder`

function to do this and then read from a file, however now it should be done on the fly at application startup.

Now I’m looking for the most efficient way to implement this. As this is a task that is performed under the hood in some FFT implementation I thought that maybe someone here knows a nice efficient way of doing this, e.g. some fancy bitshift operations. All my straightforward ideas seem quite clunky.

If you don’t know what I’m talking about, just have a look here: https://de.mathworks.com/help/signal/ref/bitrevorder.html