Again, hat tip to Alastair who pointed me at this article. At the most basic level, there are real costs, and real consequences to not being able to act nimbly, and leverage the bandwidth you need to perform the operations you require to successfully perform your job functions. These consequences could have some significant implications for legal cases. Or for terror threats.
What if you have a trove of data, that you have to act quickly upon? What if you can’t move it? Or can’t process it? Or cannot store it?
This is not a hypothetical situation. The US government, among others, has been targeting the MegaUpload founder for prosecution. In order for them to provide support for their assertions, they need to comb through the data on his servers.
This founder isn’t dumb, and his business is not just storage, but bandwidth. He knows how to design systems and move data.
The question I have is, whether or not the US government side of the equation does. And this is where the anecdote comes in.
The money quotes are
The US government has been ordered by a New Zealand High Court judge to immediately prepare to copy the 150 terabytes worth of data held on Megaupload servers seized by the FBI in order to turn it over to indicted founder Kim Dotcom.
According to an affidavit by FBI agent Michael Postin, copying just 29 terabytes had taken the agency 10 days. Postin said that copying all the data could take two and a half months. Even then, some of the data could not be copied as it is encrypted, Postin stated.
So … several possibilities.
The US government may be vying for more time and using very slow technology.
They may be … er … stretching the truth. (no, NO you say)
They may really have a copying data rate of 33 MB/s (29000 GB / (10* 86400 seconds)) … OH MY GOSH THEY ARE USING USB DRIVES!!!! If so, about the only phrase I could use would be “less than optimal” and “someone really needs to hire a company with a clue about moving/storing data.”
I could go on, but the point is they are either need a help badly, are using the wrong equipment, or are trying to vie for more time.
So … how’d they get in this position (assuming they are being honest, and not trying to get more time)?
Could they be using USB disks? On one machine at a time? Yeah.
Do they have the storage space they need for copying and analyzing 150TB of data? Possibly … though possibly not. Will discuss during the anecdote.
Is it possible that they and their infrastructure are overwhelmed by a project of this magnitude? Definitely.
Are there things we (my company) can do to help? Most definitely … though there needs to be something purchased first for them to get the benefit of what we can do and how we can help …
Ok … A few years ago, we were approached by someone from a three letter agency (no not that one, the same one in the article) about providing storage for forensic data analysis. They were talking about 30-40TB sizes, spread out over the country in various secure offices. The idea was/is to effect a discovery process by capturing digital content from existing storage devices, and make them available to forensic examination people at the agency, working on cases involving that data.
We liked the concept, not merely for the possibility of selling storage, but the idea of helping capture “bad guys”. Our friends, colleagues, competitors, etc. all talk about their successes in helping customers and users achieve their goals. It would be nice to come up with a way to help those whom are tasked with protecting us, to better achieve their mission.
So we had discussions, and talked about all sorts of issues. I think they were pretty excited by what we could bring to the table. Some of the fastest tightly coupled storage in the market, able to feed consumers of the storage at a very rapid data rate. Able to store new data rapidly. Far more so than their previous attempts.
They told us they worked through a reseller. Fine. No problem.
The reseller called us up. Discussed the quotations we provided. And the margins.
I guess when you are used to selling EMC, and Netapp, that our margins are slim pickings. Our units don’t cost nearly what they cost, and are much faster for this use case.
But the reseller only cares about his margins.
Quick side path to the GSA. US Government purchasing goes through this agency called the GSA. They are supposed to provide an open market interface for the government. As the recent GSA scandals show, it doesn’t quite work out that way.
Additional layers between those purchasing and the provider only increase costs, decrease efficiency. They never save money. They never increase efficiency. They remove competition from the market, and competition is in part how prices are set. Prices in a market economy are set by the market. So if you start gaming the market economy, guess what happens to prices.
The GSA is all this. And much more. And if you want to sell to the US government, you either need a reseller with a contract schedule or you need your own contract schedule.
To call it maddening is … an understatement. But you have no choice. Its their market. You have to play by their “rules”. Did I mention this was maddening?
So, since we don’t have a schedule 70 contract we have to work through a reseller.
And the reseller is used to nice fat margin checks from Netapp and EMC. These smaller shops, that design and build stuff more appropos for (whats now called big data) tightly coupled computing and storage? Mebbe not so much interest.
We don’t have the full story, just bits and pieces.
Basically after hearing our reseller margins, our current information is that the reseller got the three letter agency to reconsider using our gear.
This was more than a year ago. Closer to two years ago.
This morning, we ran a simple 1TB read/write test for one of our customers on new gear we are sending out to them. Currently this gear maxes out at 108TB usable, and in short order it will be about 150TB usable. The next gen, even more, and much faster.
Run status group 0 (all jobs): WRITE: io=998.28GB, aggrb=4569.6MB/s, minb=4569.6MB/s, maxb=4569.6MB/s, mint=223707msec, maxt=223707msec Run status group 0 (all jobs): READ: io=998.28GB, aggrb=5191.9MB/s, minb=5191.9MB/s, maxb=5191.9MB/s, mint=196894msec, maxt=196894msec
So …. in 224 seconds, I wrote 1 TB of data. In 197 seconds, I read 1TB of data.
Thats 438.6 TB/day I can read.
Thats 385.7 TB/day I can write.
This is on a single box.
Now imagine large clusters of these, and simultaneously imagine the words “mulitiplicative effect” when asking about bandwidth.
Dear three letter agency people … you can have our gear for a reasonable price. Lets find you a reseller that will work with both of us to do the correct thing.
Imagine you have found a large trove of data you have to comb through quickly (hey, this sounds like BIG DATA FORENSICS). Suppose that there is an implied threat associated with not being able to analyze the data rapidly enough. Oh, many possible scenarios … lets pretend that someone has discovered a way into a stock exchange and their nefarious plans for having every share of Apple bought also buying a share of RIM is stored on the trove somewhere. Never mind how silly that is, you can plug your own more plausible scenerio in as you wish.
But you have to comb through massive amounts of data to find this. Ignoring encryption for the moment, what if all the data were stuck on a bunch of USB drives?
Yeah … uh … not so much on the combing side. More like a furtive glance at the data.
That is, if the system you employ to provide the fundamental infrastructure for your analysis is simply not up to the task, then … exactly why are you using it? Do we have people who have no choice but to work in conditions where the only possible outcome is an epic fail?
In the US we are supposed to have a concept of justice delayed is justice denied. That is, you can’t be held indefinitely on charges while prosecutors and forensic examiners gather evidence. They have to, pardon the crude language, defecate or get off the can.
Imagine a legal team unable to gather and comb through the evidence they need in a reasonable time, because they lack the necessary tools to do so. Oh … you don’t have to imagine this. Its there in the article.
There are many other scenarios we can talk about, but they all come down to the bandwidth wall. It either enables you to do your job. Or it prevents you from doing your job. One or the other.
I hope the legal folks in the article, can get their job done. It seems they will need about 50 days to do what we can do in less than 1/2 a day.