You are looking at posts that were written in the month of November in the year 2008.
Posted on November 24th, 2008 by Tim.
Categories: General/Misc..
The fastest computer in the world in 1998, barely 10 years ago, was around 1 terraflop on almost 10,000 cores.
Today, the Geforce GTX 280 costs around $400 and delivers around 950 Gigaflops.
Posted on November 11th, 2008 by Tim.
Categories: Programming, Tim.
Reversing the bytes of an integer is a reasonably common operation when doing network programming, especially when dealing with possibly differing endianness. Reversing the endianness came up at work today, so I was thinking about a slightly different problem.
Given a 64-bit integer, what is the fastest way to reverse the ordering of the bits? The naive approach is to use a loop getting 1 bit into place each iteration. The result is something like this:
private static ulong RevBits1(ulong p)
{
ulong r = 0;
for (int i = 0; i < 64; i++)
{
r = r >> 1;
r = r | (p & 0x8000000000000000);
p = p << 1;
}
return r;
}
If you don’t count the loop operations, this ends up with 256 operations, 4 per iteration. If you unroll this, you can make one of the terms a constant factor, bringing the operation count down to 192. We can improve on this with a divide and conquer approach:
private static ulong RevBits(ulong p)
{
p = (p << 32) | (p >> 32);
p = ((p << 16) & 0xFFFF0000FFFF0000) | ((p >> 16) & 0x0000FFFF0000FFFF);
p = ((p << 8 ) & 0xFF00FF00FF00FF00) | ((p >> 8 ) & 0x00FF00FF00FF00FF);
p = ((p << 4) & 0xF0F0F0F0F0F0F0F0) | ((p >> 4) & 0x0F0F0F0F0F0F0F0F);
p = ((p << 2) & 0xCCCCCCCCCCCCCCCC) | ((p >> 2) & 0x3333333333333333);
p = ((p << 1) & 0xAAAAAAAAAAAAAAAA) | ((p >> 1) & 0x5555555555555555);
return p;
}
Harder to read, but this version is only 28 operations. In asm we could get this down even farther on 64 bit machines, since the first statement can be written as a 32 bit rotation, and the second statement can be written as two 16 bit rotations, bringing our operation count down to 24. Since each of our other operations is nearly equivalent to an opcode (on 64-bit processors at least) this is pretty close to the actual number of opcodes that could be generated by a good optimizing compiler.
It would be interesting to see if this could be optimized further
-Tim