# back from the UK, and a good reason to drink Guinness

By **
joe
**

I enjoyed my trip. Well, not the part of being away from my family, but there is much to see/experience in London. A curious difference between London and the UK in general and the US is the apparent lack of public restrooms (or WC’s if I have the right localization). Especially in a crowded space like Covent Garden. The customers in the UK (spent time with two at their sites, on the phone with ~5 across the world, and working with ~4 via email) have good problems (not as in blocking problems, but emergent problems that occur in a variety of use cases). These are the sorts of things our services side handles well. Its interesting to see how the configuration management space is changing, and we got to see the often interesting and unexpected on our platforms with newer Red Hat loads. I keep hearing how RHEL 6.2 has improved IO performance, but we just aren’t seeing it. Worse, actually, and there are some very significant stability issues on our gear. A quick bump to an updated 3.2.x kernel solved the latter. The former comes from a variety of sources. Specifically, what we’ve noticed (initially suggested by a customer who caught first wind of this, by looking at performance and average balance) is that there are now some significant issues with load balance and IO write performance on multiple RAIDs using the deadline scheduler in the stock kernels (as well as the 3.2.x kernels). I started writing a parser for fio output to specifically measure this imbalance. Once we get more data (in the lab), we’ll do some analysis and post results. Another thing I’ve noticed is the preponderance of puppet. Our finishing scripts perform functions that can be driven by puppet, so we are looking at doing some level of integration with this, in order to enable customers to reload our units if they use Puppet. Since we are also perl afficiandos, we will probably start using/supporting Rex for our storage clusters when running statefully. I can’t say again how much I’ve enjoyed the trip. The people (the ones I worked with and spoke with) were/are wonderful. The environment was pleasant. The historical things to see (science, computing, culture …) were simply too numerous to visit in a very short window, never mind many trips. And then there was the beer.

I have a fondness for numerical analysis, for benchmarking, and for attempting to figure out if specific knobs available in systems have any impact upon things. While working on the poor IO performance on the RHEL 6.2 system, I had a few guesses based upon other experience. When we stopped the guessing side, and proceeded down the systematic side, we didn’t need many tests to determine which knobs had great effect on performance. This determination of the important or significant aspects of a set of measurements is related to a design of experiments problem, and to what we call “statistical significance”. This phrase is badly overloaded, and as often as not, badly misused. My wife’s recent bout with cancer (she’s sitting next to me learning Latin online right now, so I’ll call this one a “win” for now) really revealed to me how much some areas of inquiry over-rely upon statistical methods, at the expense of an underlying theory. Without an underlying theory to predict and explain observations, all we have are underlying statistics. Which may not actually be meaningful in and of themselves. Population statistics, and individuals, for example, are not really useful for discussing disease, despite their use in this. Population statistics and analysis are deeply decoupled from the underlying disease processes. Its predictive power for an individual is effectively zero, but for a large enough community of people, it might be able to tell what fraction might follow one or more branches of a decision or catastrophe tree. This is very important when someone says “95% of the time we see this.” What’s overtly neglected in this case is that their sample sizes are sufficiently large so that they can talk about observations without dealing with the lack of an underlying theory. Hell, I could use a random collection of ballons filled with equal amounts of (different) gasses and correctly posit that some fraction of the balloons will fall to earth, and some will rise. And for a large enough sample, I will in fact be correct. But note that there is no predictive power in this for a single random balloon measurement. And absolutely no connection with the underlying theories of gravity and buoyancy. Which are needed to explain the observations. This separation of analysis from the underlying theory of what’s going on to give the results is definitely an issue. If you start asking questions with economic, health, or similar importance, having at least a tenuous connection to the underlying theory is a good idea, or you are stuck in looking at correlation coefficients, knowing full well that there is no implied causal relationship (and no backing theory) you can fall back on to discuss likelihood of this being a real calculation. So how does this have anything … I mean anything … whatsoever to do with beer? The “Student’s t-test”, as it turns out, came from such a set of analyses, by William S. Gosset, aka “Student”. What’s his connection with beer? Funny you should ask. From the Abstract of the PDF (highly recommended, very interesting, and entertaining academic paper).

The paper pointed out that while Gosset didn’t have a great model of the underlying processes, he could tie his tests to specific observables with specific value. The “degree of saccharine” in each brew, and perform effective analytical tests to maximize the economic benefit of being able to control this. That is, he figured out how to measure, test for, and to some degree, control inputs that resulted in a particular desired output. An output that maximized his companies (Guinness) profit, by yielding a repeatable high quality product. Which he could effect by sampling his inputs. I am a self admitted fan of Guinness and other dark brews. I had not known the history of this. In fact, I’ve seen the student’s t test referred to as “Pearson’s t-test”, “Fishers t-test”, and others. I was not aware of the connections between them. I find this stuff fascinating. So, raise a pint to diminishing and controlling our errors, increasing the accuracy in our reporting, and understanding what to sample and how often, as well as how to design experiments.