Kaiser Fung, author of the popular blog Junk Charts, has a new book out called Numbersense: How to Use Big Data to Your Advantage. It's a fascinating and highly accessible read, full of humor and even some graphs derived from JMP. I enjoyed learning about such things as the U.S. News & World Report rankings formula, implications of using different measures of obesity and overweight, how unemployment data is collected and reported, and the challenges of offering personalized deals (and keeping merchants happy) faced by Groupon.
The many stories Kaiser shares in the book all serve a higher goal of helping to give readers Numbersense, a quality he says he always looks for in a data analyst. "Numbersense is that noise in your head when you see bad data or bad analysis. It's the desire and persistence to get close to the truth," Kaiser writes. I highly recommend this new book, as well as his earlier book, Numbers Rule Your World.
We'll be giving away 25 copies of Numbersense soon, here at the JMP Blog. For those who don't win a book, we'll offer a complete sample chapter to download for free. Subscribe to this blog so you don't miss these upcoming posts.
I asked Kaiser a few questions -- some personal, some about the book -- and he graciously answered them, below. If you're headed to Discovery Summit in San Antonio next month, you'll have a chance to hear and meet Kaiser in person. He will be on our statistics panel and will also sign copies of Numbersense for attendees -- all on Sept. 12.
How did your parents choose the name Kaiser for you?
Kaiser: My Chinese name is translated as Kai Sze. It's a beautiful name, meaning "open-minded." They looked for something similar.
How long did it take for you to write this new book?
Kaiser: It's been not quite three years since Numbers Rule Your World came out. Readers of my blogs will notice that I sourced and tested some ideas along the way. The actual writing probably took 12 to 18 months; I can do it quickly since I enjoy it so much.
When and how did you first develop a love of numbers?
Kaiser: I have had some inspiring teachers in high school as well as college. But it was after I started working with real-world data that I realized analytics is a fun and engaging job. I like the creativity and the need to use one's judgment.
What would you do if you didn't work with numbers?
Kaiser: There could have been many possibilities. I am one of those people who have wide interests. But I think I'd have found a different path to writing.
The subtitle of your book is “How to Use Big Data to Your Advantage.” Explain your conception of Big Data, which you say in your book is more expansive than most definitions.
Kaiser: I'd like our industry to think about the consumption of Big Data. Up till now, we are very focused on the production side, on the collection, storage and processing of unprecedented amounts of data. One consequence of this is the creation of lots of data analyses. Anyone who is exposed to analytics knows that it is normal for different people to come up with diffferent analyses, and they could be contradictory but neither is definitely wrong. So we all have to become smarter at consuming data analyses. That's the key message of the subtitle. When we take the perspective of a consumer of data, we'll realize that there is a limited number of decisions for which the data is affecting. The challenge for analysts is how to turn the massive stores of data into a few well-articulated solutions for easy consumption.
You share many interesting stories from the business world in your new book. What's your favorite story?
Kaiser: I love the Groupon story, and not just because my blog readers may have avoided wasting their money on the IPO when I first raised the issue. I love it because one can tie the story to a couple of fundamental principles in statistics. First, it illustrates nicely the need to be careful when using statistical averages. (I have two more examples of this in Chapter 3 of Numbers Rule Your World.) Second, it highlights the power of counterfactual thinking. This gets to the gist of understanding causal effects. It also explains the title of that chapter: how can sellouts ruin a business.
You are a statistician by training (in addition to having an MBA). What is the next frontier for statistics? What is statistics unable to do but you'd like to see it do?
Kaiser: As powerful as it is, statistics is not good at explaining why something happens, or why someone behaves in a certain way. For example, no amount of online tracking data can explain the reasons why someone clicked through multiple pages and made a purchase while someone else dropped out in the middle of the process.
In addition, statistics cannot offer the certainty that so many people want. For example, another gigantic steroids scandal is hitting Major League Baseball -- when I looked at the statistics of testing in my first book, I warned that the anti-doping testing program has a big false-negative problem when most of the media focused on a nonexistent false-positive problem. One aspect of the current scandal that is missed in media reports is that Alex Rodriguez and Ryan Braun did not get caught by tests -- the evidence against them was from an eyewitness and an investigation. The statistics of testing tells us a lot of dopers are getting away with negative test results, but the statistics can't tell us who they are.
In your experience as an applied statistician, what kinds of statistical mistakes do organizations often make?
Kaiser: I cover many of the mistakes in Chapter 2 of Numbersense in the context of the obesity crisis but the same things happen in the business world. I called this the perversion of measurement. One of the worst mistake is to blame the metric, and spend time tweaking the definition of the metric instead of spending time to diagnose problems and figure out ways to improve the existing metric. The new metric is often found to be as bad or even worse than the old one.