Every now and then I sneak a little time in to play with code. I don’t get to do as much coding as I wish these days … or … not the type of stuff I really enjoy (e.g. squeezing as much performance out of stuff as I can).

The Ceph team are rewriting some code, and they needed a replacement log function that didn’t require FP, and was faster/smaller than table lookup. I played with it a little, starting with FP code (Abramowitz and Stegun is a good source for the underlying math, by copy is dog-eared and falling apart). Then I wanted to see what could be done with pure integer code. Log functions can be implemented in terms of series in FP, or shifts and logical operations(!) for ints.

So pulling out a number of options, I did some basic coding, and summed the logs of functions of argument 1 to 65535 inclusive. Compared this to simple casting of the DP log2 function. Code is simple/straightforward, and fast.

#include#include $$

Compile and run it, and this is what I get

landman@metal:~/work/development/log$ gcc -o log2.x log2.c -lm ; ./log2.x library function sum: 917506 local function sum: 917506 milestone=0 to 1: time = 0.00284 seconds milestone=1 to 2: time = 0.00091 seconds

Milestone 0 to 1 is the library log function. Milestone 1 to 2 is the local version in the source. Now change out the log_2 function for a different variant

int log_2 (unsigned int v) { int r; // result goes here static const int MultiplyDeBruijnBitPosition[32] = { 0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31 }; v |= v >> 1; // first round down to one less than a power of 2 v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; r = MultiplyDeBruijnBitPosition[(unsigned int)(v * 0x07C4ACDDU) >> 27]; return r; }

and the results are even more interesting

landman@metal:~/work/development/log$ gcc -o log2a.x log2a.c -lm ; ./log2a.x library function sum: 917506 local function sum: 917506 milestone=0 to 1: time = 0.00288 seconds milestone=1 to 2: time = 0.00065 seconds

And it gets faster. Not my code for the log_2 function, but its fun to play with things like this.

@sijoe int log_2(unsigned int v) { return 31 – __builtin_clz(((v)) | 1); } should be 10x faster

@sijoe that’s flooring the value, as your function does, or if you want to ceiling it, use (v <= 1 ? 0 : 32 – __builtin_clz(v-1)) instead