Today, I re-read a book called Super Crunchers: How Anything Can Be Predicted
by Ian Ayres.
So what is supercrunching?
Now something is changing. Business and government professionals are
What is Super Crunching? It is statistical analysis that impacts
This is best explained by the chess example:
We tend to think that the chess grandmaster Garry Kasparov lost toThe speed of the computer is important, but in
large part it was the computer’s ability to access a database of
700,000 grandmaster chess games that was decisive.
(emphasis mine)
The book starts off with the example of Orley Ashenfelter, a Princeton
economics professor as well as founder and editor of the Journal
of Wine Economics who wanted to apply supercrunching techniques to
predict whether a wine from a particular year would be a good wine or
not. He ended up with the following equation:
Wine quality = 12.145 + 0.00117 winter rainfall + 0.0614 average
You can imagine the commotion that followed. The wine experts brushed
off this theory and that numbers can predict the wine quality better
than they can. After all, “Just as it’s more accurate to see the
movie, shouldn’t it be more accurate to actually taste the wine?”
And yet, the equation did indeed make better predictions, especially
with the prediction that 1989 and 1990 wines would be
Orley was able to make this analysis because he had access to data
about the weather and the wine quality. Ian explains that there are
two ways to get the data – it already exists (like surveys and census
or simply transaction logs of companies) or you create it using
The latter idea of creating data with the “flip of a coin” is such
a simple yet powerful concept. Techies would be familiar with this
already under a different name – “A/B
Let’s take the example of JoAnn sewing machines:
So when JoAnn.com was optimizing their website, they decided to take
The key is that:
Randomization also frees the researcher to take control of the
To realize how valuable this methodology is, let’s take the case of
Progresa:
But by far the most important recent randomized social experiment of
(paraphrased) Zedillo, the Mexican President in 1995, decided that
Progresa is aconditional
…
Zedillo’s biggest problem was to try to structure Progresa so that
So starting in 1997, Mexico began a randomized experiment on more
…
The Progresa villages almost immediately showed substantial
…
The improvements in health were even more dramatic. The programA centimeter of additional growth in such
a short time is a big deal as a measure of increased health.
(emphasis mine)
Best of all, the evidence of Progresa being a good thing was so
convincing that the new government kept it going but under a different
name for political reasons. Zedillo’s idea worked. And beautifully.
Ian goes on to demonstrate similarly how Don Berwick’s campaign
prevented an estimated 1,22,342 hospital deaths in eighteen months.
The campaign was just a few simple suggestions that were determined
based on statistics of how deaths occurred and these suggestions were
implemented by the participating hospitals. The suggestions included
regular washing of hands.
Ian quotes several real-world examples throughout the book and the
number of times that number crunching and data crunching beat human
expertise is staggering. But Ian says that this does not mean the end
of need for human intervention. Supercrunching can validate ideas but
the ideas and hypotheses themselves have to be formulated by us
humans.
He goes on to explain the 2SD
and the Bayes’ theorem in layman terms. Just understanding these two
concepts would go a long way in helping anyone decipher statistics.
All in all, the book was a good inspiring read. I would highly
recommend the book for anyone (even non-techies) interested in how
computers and databases are changing how decisions are made. These
decisions are not limited to websites. As we have seen above, it is
changing everything from how government policy decisions are made to
The key takeaway for me is that data insights are hard and so is
intuition. People who can straddle both will be important people in
future. Learning to read the data will mean getting comfortable with
statistics, models and even neural networks (as explained in the
book).
If you’re not patient enough to read the book, you can watch the
Ayres. You can also read
more of Ian Ayres’ supercrunching stories on the Freakonomics
blog.
We are drowning in information, while starving for wisdom. The world henceforth will be run by synthesizers, people able to put together the right information at the right time, think critically about it, and make important choices wisely.
— E. O. Wilson (entomologist and biologist)