A wake up call about language choices for certain use cases

I am a somewhat unapologetic Perl user. I write lots of code in Perl, simply because I find it doesn’t get in my way of expressing concepts, and its really quite fast. I’ve used many languages over the decades, and none is perfect, Perl has become something of my goto language (similar to C in years past).

I forgot the exact context I found this page under, but find it I did. On it, the author gives a simple multi-precision example in Java, Go, and Python. As a performance test, it leaves some things to be desired (I’ve got another post on a benchmarking failure I found online in the works, give me some time to get that done), but it was interesting to me in that it was a very simple test that one could explore the expressiveness of a language together with a rough feel for how well integrated the gmp or similar packages were.

I pulled the python source down and ran it. Similar times to what they had. Very compact source. This, honestly, surprised me.

Ok, I wanted to convert this to Perl to see if we could be within an order of magnitude of the other codes. Python isn’t particularly fast by itself, but it has a good set of internals to make interacting and using external libraries trivial. So how well GMP was integrated into Python would be relevant to its performance.

Likewise with Perl.

First the python code. Using 3.4.2 revision I ran the following code

# 157 bit n = pq with p ~= 78 bits
n = 273966616513101251352941655302036077733021013991
i = 496968652506233112158689  # prime minus 10e6
while True:
  if n % i == 0:
    print i
    break
  i += 2

landman@lightning:~/work/benchmarking/factor$ time /opt/scalable/bin/python3 factors.py 
496968652506233122158689

real	0m1.894s
user	0m1.880s
sys	0m0.012s

Then I rewrote this in Perl, using Math::BigInt and the GMP libs. This is Scalable Informatics toolchain BTW for Python and Perl

#!/opt/scalable/bin/perl

use strict;
use Math::BigInt only => 'GMP';
# 157 bit n = pq with p ~= 78 bits
my $n = Math::BigInt->new('273966616513101251352941655302036077733021013991');
my $i = Math::BigInt->new('496968652506233112158689');  # prime minus 10e6
my $two = Math::BigInt->new('2');

$i+=$two  while ($n % $i);
printf "$i\n";

landman@lightning:~/work/benchmarking/factor$ time ./factors.pl 
496968652506233122158689

real	2m12.028s
user	2m11.817s
sys	0m0.004s

Ok … wow. I wasn’t expecting that.

I tried the go version as well, again with our 1.4 build in the Scalable Informatics tool chain

landman@lightning:~/work/benchmarking/factor$ time /opt/scalable/go/bin/go run factors.go
496968652506233122158689

real	0m4.288s
user	0m4.338s
sys	0m0.112s

Ummm … ok … I am having a hard time understanding this.

Ok, let me rewrite this as a simple GMP C program. This should be the fastest of them all.

#include <stdio .h>
#include <gmp .h>
#include <stdarg .h>

void main(void) {
	mpz_t n,i,two,m;
	mpz_init(m);
	mpz_init_set_str (n 	, "273966616513101251352941655302036077733021013991", 10);
	mpz_init_set_str (i 	, "496968652506233112158689", 10);
	mpz_init_set_str (two	, "2", 10);
	
	while(1) {
		mpz_mod(m,n,i);
		if (mpz_sgn(m) == 0) {
			break;
		}
		mpz_add(i,i,two);
	}
	
	gmp_printf ("%Zd\n", i);
	mpz_clear(n);
	mpz_clear(i);
	mpz_clear(two);
}
</stdarg></gmp></stdio>

then compile and run

landman@lightning:~/work/benchmarking/factor$ gcc -O3 -o factor.exe factor.c -lm -lgmp

landman@lightning:~/work/benchmarking/factor$ time ./factor.exe 
496968652506233122158689

real	0m0.334s
user	0m0.330s
sys	0m0.004s

and finally, since I am enjoying Julia so much

n   = BigInt(273966616513101251352941655302036077733021013991)
i   = BigInt(496968652506233112158689)  # prime minus 10e6
two = BigInt(2)
zero= BigInt(0)

while (mod(n,i) > 0)
  i += two
end
println(i)


landman@lightning:~/work/benchmarking/factor$ time /opt/scalable/bin/julia factors.jl 
496968652506233122158689

real	0m3.433s
user	0m3.467s
sys	0m0.733s

Ok.

So, I profiled the perl code, and it spent something like 90% of its time in initializing new temps. That is the GMP library usage in perl, is horrible. And its horrible as there does not seem to be an understanding that one should avoid instantiating objects deep in inner loops. That copies should be strenuously avoided in high performance code.

What is blowing me away is how fast Python is relative to C. Its within a factor of 10. I expected Julia to be easily within that, and am quite surprised that it is slower than Python. JIT versus a purely interpreted engine.

Between this and the other bits I’ve run into (playing games with a Mandelbrot set generator in a few languages) where Perl is maddeningly slow, or, even worse, wrong on its answers (and the code is too bloody simple to be wrong, I’ve stepped through it by hand, and I still can’t catch the bug) … I think I need to take a long hard look at where I should be going with my next set of languages.

I think Python is back in (/sigh, its Fortran all over again with the damn indentation), Julia, q, and a few others. This was a wake up call for me. Perl does great for what it does. Its just not great hooking into other code and its optimizer sucks rocks.

Viewed 39489 times by 5531 viewers

Facebooktwittergoogle_plusredditpinterestlinkedinmail