Friday morning/afternoon code optimization fun

Every now and then I sneak a little time in to play with code. I don’t get to do as much coding as I wish these days … or … not the type of stuff I really enjoy (e.g. squeezing as much performance out of stuff as I can).

The Ceph team are rewriting some code, and they needed a replacement log function that didn’t require FP, and was faster/smaller than table lookup. I played with it a little, starting with FP code (Abramowitz and Stegun is a good source for the underlying math, by copy is dog-eared and falling apart). Then I wanted to see what could be done with pure integer code. Log functions can be implemented in terms of series in FP, or shifts and logical operations(!) for ints.

So pulling out a number of options, I did some basic coding, and summed the logs of functions of argument 1 to 65535 inclusive. Compared this to simple casting of the DP log2 function. Code is simple/straightforward, and fast.

#include <stdio .h>
#include <math .h>
#include <sys /time.h>
 
#define NUMBER_OF_CALIPER_POINTS 10
struct timeval ti,tf,caliper[NUMBER_OF_CALIPER_POINTS];
struct timezone tzi,tzf;
 
 
int log_2 (unsigned int v) 
  {  	
	register unsigned int r; // result of log2(v) will go here
	register unsigned int shift;
 
	r =     (v > 0xFFFF) < < 4; v >>= r;
	shift = (v > 0xFF  ) < < 3; v >>= shift; r |= shift;
	shift = (v > 0xF   ) < < 2; v >>= shift; r |= shift;
	shift = (v > 0x3   ) < < 1; v >>= shift; r |= shift;
    r |= (v >> 1);
    return r;
  }
 
int main (int argc, char **argv)
 {
 	int i, x, milestone;
 
 	float logx;
 	int sx = 0 , sumlogx = 0 ;
 	double delta_t;
 
 	milestone = 0;
    gettimeofday(&caliper[milestone],&tzf);
    for(x=1; x&lt;65536; x++) sx+=(int)log2((double)x);
    milestone++;
   	gettimeofday(&caliper[milestone],&tzf);
 
    for(x=1; x&lt;65536; x++) sumlogx+=log_2((unsigned int)x);
    milestone++;
  	gettimeofday(&caliper[milestone],&tzf);
 
   	printf("library function sum: %i\n",sx);
	printf("local   function sum: %i\n",sumlogx);
  
   /* now report the milestone time differences */
   for (i=0;i< =(milestone-1);i++)
    {
      delta_t = (double)(caliper[i+1].tv_sec-caliper[i].tv_sec);
      delta_t += (double)(caliper[i+1].tv_usec-caliper[i].tv_usec)/1000000.0;
      printf("milestone=%i to %i: time = %.5f seconds\n",i,i+1,delta_t);
    }
 }

Compile and run it, and this is what I get

landman@metal:~/work/development/log$ gcc -o log2.x log2.c -lm ; ./log2.x 
library function sum: 917506
local   function sum: 917506
milestone=0 to 1: time = 0.00284 seconds
milestone=1 to 2: time = 0.00091 seconds

Milestone 0 to 1 is the library log function. Milestone 1 to 2 is the local version in the source. Now change out the log_2 function for a different variant

int log_2 (unsigned int v) 
{

int r; // result goes here

static const int MultiplyDeBruijnBitPosition[32] = 
  {
    0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
    8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
  };

v |= v >> 1; // first round down to one less than a power of 2 
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;

r = MultiplyDeBruijnBitPosition[(unsigned int)(v * 0x07C4ACDDU) >> 27];

return r;
}

and the results are even more interesting

landman@metal:~/work/development/log$ gcc -o log2a.x log2a.c -lm ; ./log2a.x 
library function sum: 917506
local   function sum: 917506
milestone=0 to 1: time = 0.00288 seconds
milestone=1 to 2: time = 0.00065 seconds

And it gets faster.

Not my code for the log_2 function, but its fun to play with things like this.

Viewed 75363 times by 7023 viewers

Facebooktwittergoogle_plusredditpinterestlinkedinmail