Posted on November 11th, 2008 by Tim.
Categories: Programming, Tim.
Reversing the bytes of an integer is a reasonably common operation when doing network programming, especially when dealing with possibly differing endianness. Reversing the endianness came up at work today, so I was thinking about a slightly different problem.
Given a 64-bit integer, what is the fastest way to reverse the ordering of the bits? The naive approach is to use a loop getting 1 bit into place each iteration. The result is something like this:
private static ulong RevBits1(ulong p)
{
ulong r = 0;
for (int i = 0; i < 64; i++)
{
r = r >> 1;
r = r | (p & 0x8000000000000000);
p = p << 1;
}
return r;
}
If you don’t count the loop operations, this ends up with 256 operations, 4 per iteration. If you unroll this, you can make one of the terms a constant factor, bringing the operation count down to 192. We can improve on this with a divide and conquer approach:
private static ulong RevBits(ulong p)
{
p = (p << 32) | (p >> 32);
p = ((p << 16) & 0xFFFF0000FFFF0000) | ((p >> 16) & 0x0000FFFF0000FFFF);
p = ((p << 8 ) & 0xFF00FF00FF00FF00) | ((p >> 8 ) & 0x00FF00FF00FF00FF);
p = ((p << 4) & 0xF0F0F0F0F0F0F0F0) | ((p >> 4) & 0x0F0F0F0F0F0F0F0F);
p = ((p << 2) & 0xCCCCCCCCCCCCCCCC) | ((p >> 2) & 0x3333333333333333);
p = ((p << 1) & 0xAAAAAAAAAAAAAAAA) | ((p >> 1) & 0x5555555555555555);
return p;
}
Harder to read, but this version is only 28 operations. In asm we could get this down even farther on 64 bit machines, since the first statement can be written as a 32 bit rotation, and the second statement can be written as two 16 bit rotations, bringing our operation count down to 24. Since each of our other operations is nearly equivalent to an opcode (on 64-bit processors at least) this is pretty close to the actual number of opcodes that could be generated by a good optimizing compiler.
It would be interesting to see if this could be optimized further
-Tim
0 comments.
Comments can contain some xhtml. Names and emails are required (emails aren't displayed), url's are optional.