# Which CPU is faster, 3.2 GHz Nehalem W5580 or 2.6 GHz Istanbul?

Yes this is a loaded question. The context I use is a very simple double precision floating point loop, with the interior re-written to use SSE2. The idea is, if we run the identical program on the same machine, running one core, doing little else but double precision FP operations (in this case, computing the Riemann Zeta Function), with very little to no memory traffic … which CPU core will win on this very simple sprint?

The larger picture context of this is a set of reports we were hired to generate. Unfortunately, its looking like the business side of things (for the reports) is falling through so there is little I can do relative to this. We can’t/won’t work for free (we have enough people trying to get us to do free consulting for them as it is).

The code is our rzf-sse2.c code. Running it basically computes π

Here is the Nehalem 3.2 GHz machine (running at full clock speed, I turned off the speed/power governor, and determined that it was off from /proc/cpuinfo). CPU and motherboard loaned to us by our friends at Intel for other purposes, but its there, so we ran it.

```[landman@cx1-2 rzftest]\$ ./rzf-sse2.exe -l 1000000000 -n 2
D: checking arguments: N_args=5
D: arg[0] = ./rzf-sse2.exe
D: arg[1] = -l
D: infinity found to be = 0
D: should be 1000000000
D: arg[2] = 1000000000
D: arg[3] = -n
D: N found to be = 2
D: should be 2
D: arg[4] = 2
D: running on machine = cx1-2
D: start_index = 1000000000
D: end_index   = 2
D: unroll      = 2
D: inf-1       = 999999999
zeta(2)  = 1.644934065848227
pi = 3.141592652634864
error in pi = 0.000000000954929
relative error in pi = 0.000000000303963
Milestone 0 to 1: time = 0.000s
Milestone 1 to 2: time =<strong> 3.440s</strong>
```

The core loop is running on this machine in 3.44 seconds.

and

```[landman@cx1-2 rzftest]\$ grep MHz /proc/cpuinfo  | uniq
cpu MHz		: 3200.133
```

```landman@istanbul4p2us:~/rzftest> ./rzf-sse2.exe -l 1000000000 -n 2
D: checking arguments: N_args=5
D: arg[0] = ./rzf-sse2.exe
D: arg[1] = -l
D: infinity found to be = 0
D: should be 1000000000
D: arg[2] = 1000000000
D: arg[3] = -n
D: N found to be = 2
D: should be 2
D: arg[4] = 2
D: running on machine = istanbul4p2us
D: start_index = 1000000000
D: end_index   = 2
D: unroll      = 2
D: inf-1       = 999999999
zeta(2)  = 1.644934065848227
pi = 3.141592652634864
error in pi = 0.000000000954929
relative error in pi = 0.000000000303963
Milestone 0 to 1: time = 0.000s
Milestone 1 to 2: time = <strong>3.273s</strong>
```

The core loop is running on this machine in 3.27 seconds.

and

```landman@istanbul4p2us:~/rzftest> grep MHz /proc/cpuinfo  | uniq
cpu MHz		: 2600.000
```

This code was compiled with GCC 4.3.2. Our makefile is

```CC	= gcc

# use -g to turn on debugging
DEBUG	=

CFLAGS	= -O3  \${DEBUG}   -I. -Bstatic
FFLAGS	=  \${DEBUG}
LFLAGS  = -O3  \${DEBUG}  -Bstatic

NAME	= Makefile.\${PROGRAM}-\${CC}

all:	\${PROGRAM}-\${CC}.exe \${PROGRAM}-\${CC}.s

\${PROGRAM}-\${CC}.exe:	\${PROGRAM}.o
\$(CC) \${LFLAGS} -o \${PROGRAM}.exe \${PROGRAM}.o -lm

\${PROGRAM}.o: \${PROGRAM}.c
\$(CC) \${CFLAGS} -c \${PROGRAM}.c

\${PROGRAM}-\${CC}.s: \${PROGRAM}.c
\$(CC) -dA -dp -S \${CFLAGS} -c \${PROGRAM}.c -o \${PROGRAM}-\${CC}.s

clean:
rm -f \${PROGRAM}-\${CC}.exe \${PROGRAM}.o \${PROGRAM}-\${CC}.s

rebuild:
\${MAKE} -f \${NAME} clean
\${MAKE} -f \${NAME} all

.c.o:
\$(CC) -c \$(CFLAGS) \$< .f.o:
\$(F77) -c \$(FFLAGS) \$<
```

The Istanbul is, from what I can see, a formidable computational competitor to Nehalem. Dismissing it out of hand (as I have seen many do based upon Shanghai, Barcelona, and other performance metrics) would not be in anyone’s interests as users and consumers of high performance computing gear.

Its unfortunate that it does not look like the reports will be able to be generated though, as we will have to give up access to the Istanbul soon. I would have liked to have seen what it can do.

Viewed 7754 times by 1781 viewers