EDIT: I made a small mistake, here’s the updated version:
Okay, this algorithm works in three basic steps, and first I will only list them and after that explain how they work exactly:
- make several copies of the original number (fan out)
- select specific bits
- contract the number
Let’s look at those steps a little bit closer (we will go even closer later on).
First step is done by just multiplying them with a “sparse” number in a way they don’t overlap. So when looking at the bits, there are copies of them.
Second step is also easy. We use the AND operation to only select some of them. The number we use is called magic number for now.
Third step: that’s pure magic! Let’s say we are doing exactly what we want, but for now we don’t know how (wait for it).
The wrong concept
Intuitively, we would think the algorithm works as follows (let’s say we want to reverse n bits)
- Make n copies of the bit pattern
- In the left most copy select the right most bit, to have it at the left most after contracting.
- Second to left copy, second bit, and so on
- Contract them as we have it in mind.
In case of n = 4, this concept (which is wrong) would go like this:
Our binary number is x = 0i4321. I use 0i to tell you that I just write the indices of the bit positions, so we know how they end up.
Let’s make copies:
x * 0b0001 0001 0001 0001 = 0i4321 4321 4321 4321
Mask:
0i4321 4321 4321 4321 AND
0b0001 0010 0100 1000
which gives us
0i0001 0020 0300 4000
magic contraction we wish it exists will make 0i1234
Makes sense? Yes it does! But: we have a problem with the last step: there’s no operation which does that *magic contraction we wish it exists`. But obviously, there is a solution to that.
Now comes the actual explanation, what’s going on!
The right concept
The key to understanding that algorithm is the last step. The modulo operation. Everyone knows what MOD does, so I won’t explain that. However, the algorithm uses a special number for the modulo: 1023 Why is it special? It’s base - 1. So they use a base of 1024. Let’s go to our beloved 10-base system, to explain what’s going on:
base - 1 = 10 - 1 = 9
What happens with a number when we do the MOD 10 operation:
Example from the thread I have linked: 34 % 9 = 7, the individual digits are added together (you can try this with different numbers, as well (there might be some exceptions due to some kind of overflow!))
Let’s switch to another base: 16 and look what we can do with that in binary representation:
16-base or hex uses 4 bits: 0b0000. And the very cool thing by applying % 15 is the following.
Take any binary number and do the % 15 operation, what you will get is that every group of 4-bits will be added together (contracted!).
Example:
0b0100 0000 0000 0010 0000 0001 % 0i1111 which is 4194817 % 15 = 7 = 0b0111
That’s the same as: 0b0100 + 0b0000 + 0b0000 + 0b0010 + 0b0000 + 0b0001 !!! Amazing!
Important: This only works, if not all 4 bits will be set. Obviously, 0b1111 mod 0b1111 won’t result in 0b1111 
And that’s the third step of the algorithm!
We still have to conquer step 1 and 2:
- find the right fan out multiplier
- find the magic number
Let’s say we want to reverse a 3-bit number, and we use % 15 for our third step, to keep it simple. Let’s build the magic number!
We have two conditions for our magic number. The first is: when contracted, it should result in 0b0111 as we want to use three bits. So we need some elements to build the number.
Let’s say we have the following elements: 0b0001, 0b0010, and 0b0100. Shuffle them together as crazy, optionally add some 0b0000 in between, and I promise you, the % 15 will make 0b0111 out of it!
So second condition is, that we select the right bits of our number 0i0321. We can achieve that, by combining our elements.
Let’s look at that and go with our intuitive concept: make three copies of 0i4321 by multiplying it with 0b0100 1001:
0i0003 2132 1321
and find a combination of our elements to select the right positions of our number.
0i0003 2132 1321 AND
0b0001 0100 0010 =
0i0003 0100 0020
Apply % 15 we will get: 0i0123.
Woohoo! We just reversed a 3-bit number!
To write the steps in decimal:
Take a 3-bit number x, multiply it with 73 (fan out), AND it with 322 (select bits), and modulo with 15 (contract). That reverses your 3-bit number.
Try yourself: http://www.wolframalpha.com/input/?i=BitAnd[(011+base+2)+*+73,++322]+mod+15+to+binary
That’s what’s going on there.
In general: If you want to revert a n-bit number, your magic elements have to be at least (n+1) bits long, if it’s the same (like the important note above), the contraction won’t work. Your element’s can have more than one active bit. With that you can reduce the numbers of copies you have to generate (fan-out step).
The algorithm for 8 bits uses magic elements of length 10 (bits)! So the modulo number is 2^10 - 1 = 1023. And some of them have two active bits. That’s why they only have to make 5 copies instead of 8. For 3-bits I used 3 copies. There might be also a solution which uses less copies for 3 bits. It’s some Sherlock-work to find them, it’s fun! I can’t guarantee, that for every n there’s a set of (n+1)-bit magic elements to solve the problem. However, I am quite positive there is.
Hope that helped!