Working on benchmarking ML frameworks

Nice machine we have here …

root@hermes:/data/tests# lspci | egrep -i '(AMD|NVidia)' | grep VGA
3b:00.0 VGA compatible controller: NVIDIA Corporation GP100GL (rev a1)
88:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XTX [Radeon Vega Frontier Edition]

I want to see how tensorflow and many others run on each of the cards. The processor is no slouch either:

root@hermes:/data/tests# lscpu | grep "Model name"
Model name:            Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz

Missing a few things for this, as Amazon is a bit late on shipping some of the needed parts, but hopefully soon, I’ll be able to get everything in there.

Looking at the integrated Tensorflow benchmarks, which require image-net, as well as others. Feel free to point more out to me … happy to run some nice baseline/direct comparisons. I’d prefer open (sharable/distributable) benchmarks (alas image net isn’t precisely this, I put in my request for download).

Everything else is fair game though. Planning on publishing what I find.

Viewed 217405 times by 21876 viewers

Oracle finally kills off Solaris and SPARC

This was making the rounds last week. Oracle seems to have a leak in its process, creating labels that trigger event notifications for people, for their packages. Solaris was decimated. More details at the links and at The Layoff.

Honestly I had expected them to reach this point. I am guessing that they were contractually obligated for at least 7 years to provide Solaris/SPARC support to US government purchasers. SGI went through a similar thing with IRIX. Had to maintain it for N years (N being something like 7) after EOL.

After that contractual obligation expired, the question was, would the divisions be able to pay for themselves, and add positively to Oracles bottom line?

Generally, Oracle is in a very high margin software business. Not hardware, which tends to be much lower margin. Yeah, they have Exadata (or is that now gone?), storage and a few other things. But no one really looks to them any more as a leader in any aspect of the market. They are a very large player, with a set of core products that produce most of their revenue. They are working now to create a cloud, though it will likely be running Linux as the basis for their offerings. Also likely built by very low cost providers (Quanta et al.)

The calculus for their hardware division has been obvious for a long time. For Solaris, it has been getting clearer over time. The world has moved on from Sun hardware in the early 2000s. It moved on from Solaris mid 2000s. Linux has largely supplanted other Unixes in many cases (yeah I know, many aren’t happy with this).

Solaris, in some way, survives through the illumos fork, SmartOS, etc. There is a bit of baggage from those roots, in terms of perception, OEM driver support, user space, etc. Some is admittedly self-inflicted … porting should be trivial, yet I keep running into library implementation differences … that effectively prevent me from moving code bases to it. Some of this is obviated by the LX-brand zone … running what appears to be a linux kernel within a zone in the OS. Some is obviated by KVM. I remain hopeful that we’ll see these issues solved in illumos, as they are part of what killed Solaris as a viable platform. The world moved on, coalesced around a specific platform and API, for better or worse, and others fell off. Even Microsoft, after decades of battling the upstart, finally gave in and did something similar … implemented Ubuntu linux within Windows.

This is a good direction if you spend the time/effort to make sure you maintain compatibility with the fast changing kernel/environment, as well as make the underlying system as performant as possible. But you have to run hard to do this.

A great example of what happens when you set your flag, and refuse to move it to adapt to platform changes is with OS/2 and Win32 binaries. OS/2 refused to enable a new technology in Win32 at the time (I forgot what it was called) to work. Which resulted in code breaking … slowly at first, and then with gathering speed as developers adopted the new tech. I am hoping that illumos can move faster than this.

As for Solaris, and SunOS, I used the latter first at MSU in the late 80s. Around the time I used Vaxen and other systems. Including some Ultrix boxen. Then at Wayne State, I used a buddy’s sparcstation (and many other unixes) to do my simulations in the early 90s. What got me off the Sparc was when the brand new 486 unit I bought was shown to be about 2x the speed. The only advantage the Sparc had was RAM. 64MB as I remember. 486 had 16MB. The Cray machine I ran on at PSC had quite a bit of ram, and was also very fast.

Shortly after this, I started playing with SGIs and generally gave up on the Sparc units due to speed issues.

This said, I’ve always had a soft spot for real unix and workstation systems. At SGI I competed with them. At Scalable I used them and many others to help customers build scale out computing and storage systems.

Under Oracle, I thought they might have a chance if they were invested in. But apparently, this didn’t quite happen. It is sad to see such an ignoble end to Solaris and Sparc. Though, it was not unexpected.

Viewed 212353 times by 22799 viewers

M&A and business things

First up, Tegile was acquired by Western Digital (WDC). This is in part due to WDC’s desire to be a one stop shop vertically integrated supplier for storage parts, systems, etc. This is how all of the storage parts OEMs needed to move, though Seagate failed to execute this correctly, selling off their array business in part to Cray. Toshiba … well … they have some existential challenges right now, and are about to sell off their profitable flash and memory systems business, if they can just get everyone to agree …

This comes from the fact that spinning disk, while a venerable technology, has been effectively completely commoditized. It is an effective replacement for tape systems … and yes, I disagree strongly with a number of people I tremendously respect, on this particular matter. Tape is, apart from specific niches, dead. Tape vendors, tape drive vendors, are not long for this world, and are struggling mightily … though maybe not as intelligently as they could.

Over time, flash (TLC and QLC) will IMO get cost competitive per GB stored with spinning disk. Then the only thing holding back a full on switch over to flash will be manufacturing capacity issues. Until then, spinning disk will likely remain dominating the slower and colder tiers of storage.

When that crossover hits, disk will be the new tape.

Many years ago, in grad school, a buddy leaving the program wrote all his directories out to vax-tape, because, you know, vaxes … and tapes … are forever. I gotta ask him how that is working out. I have 30 year old floppies I can still (barely) read. Not so sure on vaxes. Or vax-tapes. I may have an old DLT cartridge from backups of my home Indy from 20+ years ago sitting around. No way to read that tape. Drives don’t exist anymore. I do have some ATA drives from that time period, and the capability to attach them and read them.

Which is more permanent?

Ok, back to the vertical integration. WDC seems to be doing some of the right things. They are investing in building an ecosystem that a customer can purchase offerings across a wide range of capabilities. This is IMO a sound direction, though it will require careful work to avoid competing with one’s own customers.

Second, Coho Data closed. I’ve been saying this for years, but … storage arrays as a market are slowly drying up. Somewhat of a long tail, but this is part of the reason EMC sold itself to Dell. The storage array market of all flavors and shapes (flash, disk, …) is contracting. More competitors are pursuing larger pieces of a shrinking market. Part of what drove WDC to buy Tegile, and build the rest of their plan appears to be a desire to have a foot into not only that market, and help control the flux into other markets that it also has a presence in, but also to manage the changes in that market by helping to shape how it evolves.

Coho died as the market is crowded, and non-differentiated hardware combined with an effectively random software stack of dubious actual value (which is the approach of a fairly large number of these groups) is not a viable strategy. Though with the right pedigree founders and connections, it appears you can liberate quite a bit of money from VCs in the process of failing.

Curiously, differentiated hardware, stuff that actually adds value to the software layers above it, do not seem to be in vogue for the past decade or so. But the new hotness is also NVMe-over-fabrics. Basically lets take the SAN model … and use PCIe as an interconnect. But … we need to encapsulate the PCIe into infiniband or … ethernet … packets, first.

So now we have too many of those folks proudly making benchmark guesses as to where their systems will matter, and what they can deliver. While what I see are actual numbers falling far short of what I was doing 2+ years ago. On a shoestring budget. With no venture backers.

I am not being bitter … ok … maybe a little … but I am being blunt. There are too many of these companies out there, without a valid defensible difference.

Yeah. I am not happy Coho died. It won’t be the last though.

We are into the culling phase of the market. Weaker organizations will be shut down. IP sold for a song.

Coho won’t be the last. Just on the leading edge.

Another aspect worth mentioning is the impact of clouds upon this. I’ll touch on this in a future post.

Viewed 198894 times by 21918 viewers

A completed project: mysqldump file to CSV converter

This was part of something else I’d worked on, but it never saw the light of day for a number of (rather silly) reasons. So rather than let these bits go to waste, I created a github repo for posterity. Someone might be able to make effective use of them somewhere.

Repo is located here: https://github.com/joelandman/msd2csv

Pretty simple code, does most of the work in-memory, and multiple regex passes to transform and clean up the CSV.

Viewed 165219 times by 15727 viewers

Finally got to use MCE::* in a project

There are a set of modules in the Perl universe that I’ve been looking for an excuse to use for a while. They are the MCE set of modules, which purportedly enable easy concurrency and parallelism, exploiting many core CPUs, and a number of techniques. Sure enough, I had a task to handle recently that required this.

I looked at many alternatives, and played with a few, including Parallel::Queue. I thought of writing my own with IPC::Run as I was already using it in the project, but I didn’t want to lose focus on the mission, and re-invent a wheel that already existed elsewhere.

Ok … so Parallel::Queue required I alter my code significantly. I did so. And it didn’t work. Arguments were lost. I was able to launch things, but not what I thought without further hacking. E.g. the example test code didn’t work.

Remembering MCE, I explored that universe, and found MCE::Loop. Apart from the initialization block, I simply had to change 4 lines of my code. One of my outer foreach loops was converted to an

mce_loop {} @array

construct from the previous

foreach (@array) { ... }

construct. Two additional assignments to pull the data in … and …

Worked perfectly with the IPC::Run based execution. Easily one of the best and fastest experiences I’ve had with concurrentizing/parallelizing a code.

And yes, this was in Perl 5.24. The same code had to run identically across Linux and SmartOS. And it did.

Powerful tools make you more productive. I wish I could get Perl6 built on SmartOS (running into some bugs with late versions of Rakudo star). And still working on getting Julia 0.6 to compile (dependency issues that I am working on resolving).

Viewed 77667 times by 8020 viewers

Cray “acquires” ClusterStor business unit from Seagate

Information at this link. It is being called a “strategic transaction”, though it likely came about vis-a-vis Seagate doing some profound and deep thinking over what business it was in.

Seagate has been weathering a storm, and has been working on re-orgs to deal with a declining disk market. They acquired ClusterStor as part of a preceding transaction of Xyratex. Xyratex was the basis for the Cray storage platforms (post Enginio).

So my guess (and no, I have no inside information at all, I’ve not spoken with friends at either organization) is that Seagate spoke with Cray and said something to the effect “look, we need to cut costs, focus on our core, trim the sails, batten down the hatches, and this business unit is on the chopping block. Do you want to take it off our hands? Along with the people?”

I am sure it wasn’t entirely like this.

Not entirely.

But it seems to approximately fit what I am seeing.

Generally, disk shipments are declining. We’ve likely hit peak-disk. There will be a very long tail, but like other industries in the past, buggy whips, tape drives and tape tech, I wouldn’t want to be only in that space.

Yeah yeah … some will argue with me that tape isn’t dead, or other things like that. Pull out metrics showing incredible economics, longevity, and so forth. Amazon Glacier they’ll say.

Nope.

Rigor mortis had set in long ago to that market. There is a good reason why tape mostly companies like Quantum are in a world of hurt with no real avenue to escape. This is buggy whip manufacturing all over again.

This isn’t lost on Seagate (I presume … I know lots of smart people there). Not lost on Cray either (again with the smart people).

Likely there are lots of nice things in the deal. Specials on disk pricing, priority support and access. Lots of other goodness.

Lets see what happens. But in conjunction with the changes at Intel over Lustre (getting out of the market), the changes at a number of national labs that I am aware of, I think this is probably the right move for both orgs. Cray will have a real storage portfolio that they own. Seagate will have been able to reduce its cost base and head count, while locking in a customer.

Could be a fun time to be at Cray.

Viewed 78297 times by 7982 viewers

More unix command line humor

Waaaay back in grad school in (mumble) late 80s/early 90s (/mumble), I started using Unix in earnest. Back then, my dad shared some funny Unix error messages which were double entendres … often quite entertaining, as the shell was effectively playing the straight man in a comedy duo. Without intentionally doing so (of course).

Nowadays, you can ask Siri about the air speed of an unladen swallow, and get something funny back, but that is because Siri has had that capability programmatically added. These are funny, because the humor is unintentionally ironic.

See this link for some of them.

With this background, late last week, I saw a reference to a BSD library function call run around work. The call is ffs, and a ‘man 3 ffs’ on my Mac shows something like this.

FFS(3)                   BSD Library Functions Manual                   FFS(3)

NAME
     ffs, ffsl, ffsll, fls, flsl, flsll -- find first or last bit set in a bit
     string

LIBRARY
     Standard C Library (libc, -lc)

SYNOPSIS
     #include 

     int
     ffs(int value);

Ok. This is part of the background. The other part, is the common abbreviation indicating exasperation, which is FFS.

Now that this background is in process, lets see if we can get some humor.

The man page shows up on my BSD systems (Mac, SmartOS, Linux, etc.) under section 3. But I was given a man page section of 3c, so

landman@lightning:~$ man 3c ffs
No manual entry for ffs in section 3c

[for the unix command line humor impaired, replace the ffs with the urban dictionary version of this and say that “No manual entry” line out loud with that substitution …. not at work, or in front of small children, or on the phone …]

Thank you, thank you, I’ll be here all week.

Viewed 106432 times by 8314 viewers

What reduces risk … a great engineering and support team, or a brand name ?

I’ve written about approved vendors and “one throat to choke” concept in the past. The short take from my vantage point as a small, not well known, but highly differentiated builder of high performance storage and computing systems … was that this brand specific focus was going to remove real differentiated solutions from market, while simultaneously lowering the quality and support of products in market. The concept of brand and marketing of a brand is about erecting barriers to market entry against the smaller folk whom might have something of interest, and the larger folk who might come in with a different ecosystem.

Remember “no one gets fired for buying IBM” ? Yeah, this is that.

The implication is that this might not be true for other vendors than IBM.

This post is not about IBM BTW. Not even remotely.

It’s about the concept of risk reduction in vendor selection.

And what real risk reduction means.

Lets look at this in terms of say … RAID units. RAID, as a concept, is about distributing risk failure across N units, with a scheme in such a manner as to be able to survive and operate (albeit at a reduced capability) in the event of a failure of a single unit. In some cases, in the case of a failure of two units. RAID is not a backup (yeah, it is likely time I repost this warning). RAID is about giving time to operators to replace a system component so that operations can continue.

Erasure coding is a somewhat more intensive version of this, but basically the same thing.

You make the (reasonable) assumption, that you will have a failure. You have an architecture in place, resilient to that failure. When a failure comes, within the design spec of that resiliency, you mitigate the impact of the failure if you follow protocol. Of course, if you have a failure outside of this spec, yes, you can lose data. Which is why we have layered protocols and systems for disaster recovery (replication, on and off-site).

All of this matters. Whether you are building storage systems, large computing systems, clouds, etc.

The attention to detail, the base engineering, the ability to support … find problem root causes, and meaningful remediations/work arounds … all of this matters. And from my own experience, running a company that did these things, it matters far more than brands do.

A brand is meant to be an abstract mental concept … that somehow represents how well a product should behave, and the support/engineering behind it. However, a brand is rarely that. It is really, just a name. There is little empirical evidence that shows that slapping a particular label on the outside of a box does anything to make it better/more stable. And if I’m wrong, I’d love to see the peer-reviewed studies of this (a cursory google search yielded a few results of popular anecdotal articles, with limited real analysis behind them).

My claim is that engineering matters. What you put into your design and implementation matter. What doesn’t matter, I claim, is the brand name on the box.

Sure, you can claim “they have better access to supply chain, OEMs, etc.”. And you may be right. Without revealing anything specific, I can tell you that this access doesn’t necessarily result in better outcomes.

Actually, if your boxes never have issues to begin with … well, you understand.

But more to the point. Architecture matters. Engineering matters. Support matters. Brand? Not so much.

If you or your company are making decisions based upon brands, it might be a good exercise to ask … “why” … is this being done? Is this risk reduction? If so, what risk can you quantifiably and empirically determine has been reduced? I am guessing this isn’t the real reason.

Is it comfort level with a vendor? That is, you know the brand names won’t go away, be sold off, or go into bankruptcy. Like IBM, Apple, Sun … er … oh wait.

What is the real reason that you have to buy vendor X?

And getting back to the RAID analogy above, in order to reduce risk, shouldn’t you have 2 vendors (at minimum) whom can produce the same things, with different parts (it is possible that some parts may be in common … you can’t escape that, but hey, a VW and a Porshe have pistons, and they are very different vehicles, engineering and built to different standards).

It’s too late for my old company … though I am getting support requests now to my personal email account … but in general, the question you need to ask yourself is, am I really reducing risk by concentrating risk? Will I really get better support from a behemoth who only wants to deal with massive customers, or a smaller dedicated team of experts whom are highly focused upon me, because … hey … I am core to their business, and they are invested in you.

The question is, do you want a brand, or do you want solid engineering and support, invested in your success? Not every smaller company is like that.

Scalable was.

And we lost.

Because people wanted the single brand, single throat to choke.

On the other side of this now, I see the impact with that single throat to choke doesn’t result in better outcomes.

So … you are going to get failures. You should be engineering for this. Planning for this.

Who will support you better?

That is who you should buy from.

Anyone focusing on ease of procurement over quality of engineering needs to be pulled out of the decision and purchasing loop. Really.

My argument is that operational and project risk increases when you do this. I don’t have hard numbers to demonstrate this. Merely observations of this risk being realized in various forms. With the common aspect being the single large preferred vendor taking business that would likely have gone elsewhere.

Viewed 126738 times by 10045 viewers

On hackerrank and Julia

My new day job has me developing considerably less code than my previous endeavor, so I like to work on problems to keep these particular muscles in steady use. Happily, I get to do more analytics than ever before, so this at least is some compensation for the lower amount of coding.

When I work on coding for myself, I’ll play with problems from my research days, or small throw-away ones, like on Hackerrank.

The latter group of problems tends to be somewhat poorly specified, with the solutions being preferred as minimum viable, passing all tests in the allotted time. Elegance of solution is not directly scored. Quality of implementation is determined entirely in terms of attaining the same results that the authors did for their test cases.

I’ve found the last part of that to be dubious in the past, after finding some errors in a number of their tests. Specifically in one in particular, they made an argument about a simple closed form solution for a particular problem in their discussion section, reducing it to evaluating the closed form solution. While I followed their logic, I did not agree with it, and coded up a simple brute force test (the problem was small enough to admit this mechanism). Sure enough, I found that their answers (to this one particular problem) were in fact, wrong.

So, I take their “answers” with some grains of salt. This doesn’t take away from the joy of working through solving problems with code. So I do that, and generally care less about their ranking and scoring. Though I do run my code through their checker to see if it would “pass” their tests.

Ok … that background set.

I’ve been enjoying watching and working with small bits of Julia language over the years. I include it in my Nlytiq development base as one of the core components. I’ve built automation around its build and module installation to create a useful environment for me, others … scientists, data scientists and engineers, etc. Not just Julia, but many other tools.

With this basis, I play with problems. One of the problems on Hackerrank was what they called a “megaprime” number. This is not the more common use of megaprime (million digit prime). Their megaprime number is a prime number whose individual digits are also prime (e.g. no “1” or even numbers in the digits).

Their problem was to take a set of two numbers as input, and find all the megaprimes between them.

There is a simple algorithm to do this … just find the primes between the lower and upper bound, and then test each prime for the property megaprime-ness.

The megaprime checker is also remarkably simple to articulate … and if you have a very powerful and expressive language, fairly simple to code.

function megaprime(x::Int64)
  D = digits(x);
  SD = size(D)[1];
  P = find(isprime.(D));
  if SD != size(P)[1]
    return false
  end
  true
end

Take a 64 bit Int as input, named x. 1st, break the x up into its digits, and store in D. Second, get the leading dimension of that D array. Third, apply the isprime function to each digit in D, and filter out any that come back as false, storing this into an array P. If the array of digits did not all come back as prime (e.g. the array P had some false tests within it) then return a false, otherwise, fall through and return true.

Ok, I got lazy with the isprime function there. I could have simply tested the digits to be 2, 3, 5, or 7. If they were none of these, then the megaprime test fails. This one is a bit more readable though, and likely nearly as fast (testing single digits for primality shouldn’t take long … no need to optimize this further).

Then the driver code:

# read first/last
x = [parse(Int64,ss) for ss in split(readline())];
first = x[1];
last = x[2];
count = 0::Int64;

if last < first
  first,last = swap(first,last);
end

R = primes(first,last); 

Here I read a line from STDIN, and parse it from a string into an array of Int64 types. I store the two in the first and last variables, and then check to make sure they are correctly ordered, swapping if not.

The swap function …

function swap(x,y)
  y,x
end

… yes … this is they way it should be written.

julia > swap(1,2)
(2,1)

In the driver section, I generated a list of primes between first and last storing that in R.

It is important to note that these are fairly fast functions, even on my 6 year old laptop.

julia> @time primes(1000000000)
  4.099736 seconds (9 allocations: 766.314 MB, 2.72% gc time)
50847534-element Array{Int64,1}:
         2
         3
         5
         7
        11
        13
        17
        19
        23
         ?
 999999739
 999999751
 999999757
 999999761
 999999797
 999999883
 999999893
 999999929
 999999937

Primes up to 1 billion in about 4 seconds on a fairly old machine. The machines they run this on in the cloud are somewhat faster, with more ram.

The primes to test are in the R array.

This is the meat of it. How to iterate over the R array in a meaningful way. Some folks will want to use list comprehensions and other CS constructs. These sometimes occlude intent and remove clarity. What I want to do is to loop over the elements of R.

So why not … I dunno … do that?
# now only have R items to inspect
for num ∈ R
test = megaprime(num);
if test
count = count + 1;
end
end

Notice the notation. This is exactly how I want code to read.

I could have used a list comprehension with
count = count + megaprime(t) for t ??? R

if I changed megaprime to return 0 or 1. That is possible, but I thought this explicit loop would be easier to comprehend for the programmer (me).

Yes, some will scream syntactic sugar. Some will complain bitterly over the removal of the test from the if then construct. That was due to a crash in the language I observed, simple memoization worked around that issue.

Ok. So I did this. It passed the run test on my machine.

landman@lightning:~/play/hr/mp$ /usr/bin/time ./mp.jl < inp
8
0.79user 0.70system 0:00.74elapsed 202%CPU (0avgtext+0avgdata 174512maxresident)k
0inputs+0outputs (0major+10105minor)pagefaults 0swaps

Remember, Julia is a compiled language, so this time includes startup/compilation time as well as execution time.

So I pressed the test button.

And it failed.

Well, more precisely, they indicated it failed.

“Timed out” was the error.

So I tried the “Run” button, so I could see if I could get some of the other tests run, download the inputs and outputs, and do a run to check.

landman@lightning:~/play/hr/mp$ cat inp2 out2
230711883 401853350
7362
landman@lightning:~/play/hr/mp$ /usr/bin/time ./mp.jl < inp2
7362
9.67user 0.66system 0:09.66elapsed 106%CPU (0avgtext+0avgdata 301256maxresident)k
0inputs+0outputs (0major+12135minor)pagefaults 0swaps

Works on a much larger problem, again, correctly.

But I get the timeout and … now … “too much output” message.

I am guessing that they have an old version of julia installed. 0.5.2 is current stable, with 0.6-RC series up. Hopefully they will update soon.

This said, my point about all of this is that Julia is a joy to use. No silly indentation, useful and expressive functions, rich and growing module ecosystem, multiple mechanisms to do things, compiled … did I mention, compiled?

Viewed 124622 times by 9152 viewers

The birthday problem (allocation collisions) for networks and MAC addresses

The birthday problem is a fairly simple to state situation. There is at least a 50% probability (e.g. even chance) that at least 2 of 23 randomly chosen people in a room have the same birthday. This comes from some elementary applications of statistics, and is documented on Wikipedia.

While we care less about networks celebrating their annual journey around Sol, we care more about potential address collisions for statically assigned IP addresses. And, curiously, MAC addresses.

Ok, first some Julia code:

function P(N::Integer,M::Integer)
  prod = 1.0
  for i in 0:N-1
    prod = prod * float(M-i)/float(M)
  end
  prod 
end

Now cut and paste this into your REPL (julia command line, I’ll wait).

Now, we can replicate the P(A’) and P(A) calculations on the wikipedia page trivially.

julia> P(23,365)   # P(A')
0.49270276567601445

julia> 1-P(23,365) # P(A)
0.5072972343239855

And our mechanism is generic enough, that we can apply this to Classful networks with IPv4, and other things.

The equivalent calculations for a IPv4 class C network (/24), with 1 gateway, 1 broadcast address, so 254 usable addresses …

julia> P(20,254)    # P(A')
0.4639660131994906

julia> 1-P(20,254)  # P(A)
0.5360339868005094

That is, just 20 defined (random) addresses in the class C are enough to get at least a 53.6% probability that there will be one address collision.

You might think, “hey, this is for statically defined addresses, so who cares, we do DHCP everywhere”. Doesn’t matter though, as the DHCP server has to allocate an address, preferably an unused address. It has to test that address to see if it already exists (defensive coding … should really be done) in case another system allocated this address to itself.

Of course, you might say “Darn it, we’ll just use class B’s everywhere. So lets look at that again. Remembering that a class B is 65536 – 2 = 256*256 – 2 = 65534 addresses. I mean, this is more than enough … right?

julia> P(302,65534)    # P(A')
0.49926691064939116

julia> 1-P(302,65534)  # P(A)
0.5007330893506088

Yeah … you only need 302 allocated addresses out of the pool of 65534 to get at least a 50% probability of a collision on allocation.

What about a class A? I mean really … we shouldn’t get collisions … we have sooo many addresses …

First, number of addresses

julia> 256^3 -2
16777214

so …

julia> P(4823,16777214)     # P(A')
0.49999139514729646

julia> 1-P(4823,16777214)   # P(A)
0.5000086048527035

Erp … just 4823 addresses out of 16M. In fact, if you look at it closely, you’ll realize that the crossover point goes as about the square root of the number of slots (IP address in this case).

This of course, assumes that the space isn’t sparse, that all addresses are accessible. If the space is sparse, so that large chunks are never allocated or used for whatever reason, you wind up with an interesting problem. You reduce the size of the space by the size of the holes. Which reduces the number after which you will get a collision.

Ok.

So why on earth would I spend time tackling something that is ostensibly a solved problem?

Well, there is another problem related to these, that people building out large data centers have likely seen.

MAC address space collisions.

Technically, MAC48 is obsolete. In practice, it is still very much in use. So we have to live with consequences of this. There are

julia> 256^6 
281474976710656

addresses in this space. Which, if we use our preceding rough approximation of the square root of that number, we’d get about 16M unique random MAC addresses before we hit a collision.

Or

julia> P(19753662,281474976710656)     # P(A')
0.5000000188521087

julia> 1-P(19753662,281474976710656)   # P(A)
0.49999998114789135

Imagine, if you will, a network with 1M machines, each with 32 VMs and unique MAC addresses per VM. You will have greater than a 50% chance of a MAC48 collision on this.

This analysis assumes (of course) that the space isn’t sparse. It’s not like there is a vendor field … er … no … wait. There is.

First 3 octets in the address are the vendor OUI. Second 3 are the address. So technically, your pool is really out of 16777216 per vendor.

So … if this is a popular unit, say a LOM NIC, and the vendor wraps the MAC addresses, rather than requesting/paying for allocating another OUI segment …

You could get a collision. Similarly a very large number of VMs on many machines in the DC … you could get a collision.

The analysis gone through here is fairly naive, but it shows that such collisions are anything but improbable. Might be worth thinking about in the era of IoT. IPv6 could help with 3.4 x 1038 addresses … we’d need order of magnitude 1019 addresses see higher probabilities of collisions … This might seem like a long way off, but … imagine if you are working on an allocator for these addresses, and you have a 50% or more probability of generating a collision. Which suggests that you are going to have to rethink how you allocate addresses, what state are you going to preserve (do you really want to store even a small fraction of what you allocated???), etc.

Someone once said something about N kilobytes being enough for any machine, or maximum of M computers as the total world market. Both scenarios had nothing to do with collisions, but were based upon some (wildly) incorrect assumptions about the future. While IPv6 looks like it will push out the pain for a while, there are other aspects to this that might need some looking at now, and planning for a future where we have the ability to easily extend the range as needed.

I mean, we have 1019 addresses before we have to worry about higher probability of collisions … what could go wrong?

Viewed 124587 times by 7984 viewers