Computer processors have undergone a stable and consistent growth since Alan Turing and his contemporaries invented the first “modern” mechanical computers. One way of quantifying this growth is by Moore’s Law which says that every two years we will double the transistors on integrated circuits. While this is a bit too technical to mean much to me, to Intel that means a new processor generation every two years. I couldn’t find a direct benchmark comparison, but try to remember the cutting edge Pentium III you used in 2000 and compare that to the Intel Haswell chip in your ultra-thin MacBook Air (notwithstanding the high-end quad cores in performance machines.)
The ubiquity of advanced analytics
This growth in computing capability has dramatically and positively changed the face (and pace) of analytics. Concepts like machine learning aren’t just hypotheticals or relegated to academia anymore; they are reality, they are powerful, and they are everywhere. The value we get from using advanced analytics is immense, and now, more than ever, modern tools are highly accessible to a wider array of users. Users may not know (or even need to know) how the wheels turn behind the scenes, but with very simple interfaces they’re able to start those complex wheels turning.
First building block: Data
While all this technology has opened up amazing possibilities with respect to easily accessible insight, we would be loath to forget all of the lessons that traditional statistical methods can provide. While the notion of stating a “formal” hypothesis may seem to be limiting (e.g., why test one thing when I can explore a thousand?), taking the time to formulate a research hypothesis makes you think critically about what you’re doing. One of the most important questions you can ask yourself during this process is whether the health data you’re using is even appropriate to answer the questions you want to consider. Lots of data sources may collect similar data elements, but they collect them in different ways and for different reasons.
For instance, medical diagnoses can be captured from billing claims, EMRs, patient histories or public health surveys (e.g., NHANES). Each of these sources could potentially be used to power similar insights – but they do so with differing qualities and caveats. Claims and EMRs come from an “expert” clinical source and diagnoses may be more accurate, where patient histories may include information outside the view of the treating physician but are based on a patient’s own biased recall. All three of these sources are limited to a self-selecting population and lack the coverage of what a general population survey might represent, though here you are limited by data use restrictions, questionnaire limitations and the bias of those pesky respondents.
The art of statistics
Perhaps the most confusing part, and what makes statistics more of an art than a science, is that all of the above scenarios can be right depending on your needs.
I don’t bring up this issue to deride or lampoon the prevalence and utility of highly accessible analytic tools or those who use them. I’m a strong believer that broader access to these tools will open us up to insights we wouldn’t otherwise uncover. At the same time, we can easily forget that not all insights are created equal. As you look at the results and information you uncover, before you evaluate the impact they may have on your business, first evaluate the underlying quality with which they were created.
An example comes from a former colleague who worked on a study profiling pilots and trying to predict who would make a good pilot. In the end, the only significant factor they found was whether you liked strawberry ice cream. Likely, I would guess that a fear of heights and motion sickness are better indicators that I wouldn’t be a good pilot, but maybe it’s been the ice cream all along.