Predicting the one percent in health care

Episode analytics is a method of using patient-centric data to define episodes of care. These episodes of care can be used to define standards of care – from both a cost and quality perspective – and then project these standards forward to establish bundled payment budgets and quality targets. This can be considered a global method of controlling costs. But what if episode analytics can be used in a predictive sense to determine the next top spenders?

Health care spending is not equal. For the civilian* population, 20 percent of spend is on behalf of one percent of the population, and five percent of the population is responsible for nearly 50 percent of all spend. These members are easily identifiable through claims analytics, and are often the focus of case management efforts to help control their costs. While these care management efforts are effective, they can’t reverse historical spending – nor can they ameliorate the episodes of care that drove the spending. The question is, can episode analytics be used to identify the episode characteristics that can predict the next one percent in order to practice preventive care?

Because SAS® Episode Analytics is patient-centric, it provides a full view of the episodes of care the patient has experienced. This view, however, is rather unique. Not only is all care included, but it is categorized in several manners. First, the care is associated with all episodes that are appropriate. If a follow-up visit after surgery includes diagnosis codes indicating chronic care, the chronic care episode(s) are associated with the visit, in addition to the surgical episode. This identification is hierarchical in nature as well. If the care initiates an episode of care, it is fully allocated to that episode, but can also be associated – not allocated – to another episode. Additionally, care can equally be split in the allocation. This hierarchical categorization of care is unique and allows insight into connections – or lack thereof – in care.

With SAS® Episode Analytics, you have total cost by member, by condition in the stacked graph at the bottom. The upper right breaks out cost by category (T=typical, C=complication, TC=typical with complication). And the reason for the potentially avoidable complication (or PAC) upper left shows cost by condition.

With SAS® Episode Analytics, you have total cost by member, by condition in the stacked graph at the bottom. The upper right breaks out cost by category (T=typical, C=complication, TC=typical with complication). And the reason for the potentially avoidable complication (or PAC) upper left shows cost by condition.











Another feature of comprehensive episode analytics is categorizing care as typical care or a potentially avoidable complication (or PAC). This is not only a method to quantify quality – but also identify future, undesirable, member health implications. With SAS, these PACs are categorized based on clinical criteria, such as adverse effect of drug or peripheral embolism. There are over 200 categories PACs identifiable today. These complication categories have the full claim history – including not only the procedures but also the diagnoses – behind them.

The combination of hierarchical associations as well as complication categorizations provides a valuable tool to analyzing historical claims. This new insight into member claims history provides new tools for analytics, and predictive engines. These engines can, in turn, be used to predict the members – and providers – that can benefit from future actions.


*Civilian excludes residents of institutions – such as long-term care facilities and penitentiaries, as well as military and other non-civilian members of the population. “Care” reflects personal health care and does not include medical research, public health spending, school health or wellness programs. From “The Concentration of Health Care Spending,” National Institute for Health Care Management (NIHCM) Foundation.


Post a Comment

Will the health data you’re using truly answer your question?

Computer processors have undergone a stable and consistent growth since Alan Turing and his contemporaries invented the first “modern” mechanical computers. One way of quantifying this growth is by Moore’s Law which says that every two years we will double the transistors on integrated circuits. While this is a bit too technical to mean much to me, to Intel that means a new processor generation every two years. I couldn’t find a direct benchmark comparison, but try to remember the cutting edge Pentium III you used in 2000 and compare that to the Intel Haswell chip in your ultra-thin MacBook Air (notwithstanding the high-end quad cores in performance machines.)

The ubiquity of advanced analytics
This growth in computing capability has dramatically and positively changed the face (and pace) of analytics. Concepts like machine learning aren’t just hypotheticals or relegated to academia anymore; they are reality, they are powerful, and they are everywhere. The value we get from using advanced analytics is immense, and now, more than ever, modern tools are highly accessible to a wider array of users. Users may not know (or even need to know) how the wheels turn behind the scenes, but with very simple interfaces they’re able to start those complex wheels turning.

First building block: Data
While all this technology has opened up amazing possibilities with respect to easily accessible insight, we would be loath to forget all of the lessons that traditional statistical methods can provide. While the notion of stating a “formal” hypothesis may seem to be limiting (e.g., why test one thing when I can explore a thousand?), taking the time to formulate a research hypothesis makes you think critically about what you’re doing. One of the most important questions you can ask yourself during this process is whether the health data you’re using is even appropriate to answer the questions you want to consider. Lots of data sources may collect similar data elements, but they collect them in different ways and for different reasons.

The myriad of health care data

The myriad of health care data

For instance, medical diagnoses can be captured from billing claims, EMRs, patient histories or public health surveys (e.g., NHANES). Each of these sources could potentially be used to power similar insights – but they do so with differing qualities and caveats. Claims and EMRs come from an “expert” clinical source and diagnoses may be more accurate, where patient histories may include information outside the view of the treating physician but are based on a patient’s own biased recall. All three of these sources are limited to a self-selecting population and lack the coverage of what a general population survey might represent, though here you are limited by data use restrictions, questionnaire limitations and the bias of those pesky respondents.

The art of statistics
Perhaps the most confusing part, and what makes statistics more of an art than a science, is that all of the above scenarios can be right depending on your needs.

I don’t bring up this issue to deride or lampoon the prevalence and utility of highly accessible analytic tools or those who use them. I’m a strong believer that broader access to these tools will open us up to insights we wouldn’t otherwise uncover. At the same time, we can easily forget that not all insights are created equal. As you look at the results and information you uncover, before you evaluate the impact they may have on your business, first evaluate the underlying quality with which they were created.

An example comes from a former colleague who worked on a study profiling pilots and trying to predict who would make a good pilot. In the end, the only significant factor they found was whether you liked strawberry ice cream. Likely, I would guess that a fear of heights and motion sickness are better indicators that I wouldn’t be a good pilot, but maybe it’s been the ice cream all along.

Post a Comment

It takes all kinds (of data): Going beyond our comfort zone with clinical models

When I’m working with new customers or on a new project, there are a handful of questions I typically ask. These help me set the stage, understand needs, and most importantly – learn the customer’s expectations. Almost always, I spend some time talking about what an acceptable model looks like to them. Does it need to have certain characteristics or can the data speak for itself?

“Let the data speak” is the gist of the typical answer, but that usually isn’t reality. It’s like telling someone to “help yourself to anything in the fridge”; you really don’t mean for him or her to grab the steaks you were planning on eating for dinner. They can have anything they want, inside of a predefined, unspoken set of boundaries.

We want to explore the data, but often, we want the data to speak to us in terms of what we already know. An endocrinologist isn’t likely to accept a model predicting diabetes trajectory that doesn’t include HbA1c. A cardiology researcher is going to want to see a QT interval. And an epidemiologist specializing in pulmonary diseases is going to want FEV.

We convince ourselves – due to research, expert opinion, or simply habit – that models must include certain concepts or be rendered invalid. I definitely advocate for the consideration of these known factors in model creation. They’re not only elements that will help to define a robust model, but given our current clinical knowledge, they represent mechanisms by which we can effect a change.

However, while creating models with such considerations is necessary to provide value in a certain context, I would also raise three counterpoints to this. I challenge you to consider these the next time you start a modeling process:

  1. Health care and medicine (like most industries) are a science and while that carries with it the scientific method and its inherent rigor, it also brings with it fallibility. Unlike mathematics, the sciences represent our best understandings and not necessarily truth. While I doubt the relationship of HbA1c to diabetes will go the way of “phlogiston,” I don’t doubt that a sufficient span of time will make many of our current scientific truths seem equally preposterous.

    A statistical model built on valid and robust data that defies current clinical knowledge may be a statistician’s contribution to science. I’m not saying to throw out current knowledge and create off-the-wall models. But rather, we have an opportunity through the exploration of data to bring up new ideas or challenge old ones.

  2. Highly predictive but clinically illogical models may still have utility, though perhaps not in the traditional sense. A model derived based on magazine subscription history, peanut butter brand-switching habits, and completely devoid of any traditional cardiovascular risk indicators, that can calculate a reliable 30-day risk score for a heart attack has value.

    It doesn’t give us actionable information we can use to mitigate that risk, but it does alert us to its presence – whatever the cause. Often we may not have the luxury of ideal data to derive a model. A patient who hasn’t had a heart attack may have never seen a cardiologist, had an EKG, or even have a recent cholesterol panel or CBC. And, even if you do have this data, how often is it collected? But if Cat Fancy and a recent purchase of a jar of Peter Pan crunchy can send up a red flag, why not listen to it?

  3. Many tests are biased, most people lie (at least a little), and all systems are imperfect. We cannot necessarily assume that a data point which attempts to capture a particular concept is able to do so perfectly. Especially in fields like medicine where our most valuable observations aren’t based on static and easily measured concepts. A cashier can count the rolls of toilet paper you purchase, a bank teller can count the dollars and cents in a transaction, but even the best lab tech can’t count the number of white blood cells in a drop of blood.

    Generally speaking, the data points we use are at best highly correlated to the concepts they represent, and at worst, a set of random values. Perfection cannot be reached and bias is often impossible to mitigate, but if we can have consistent bias, we can still have useful information. We can capture directional trends and consistent results. I may not be willing to believe someone who says they took their prescribed statin 300 of the last 365 days, but if I can assume a consistent trend in bias, then that answer still has value to me (just not necessarily as an accurate measure of adherence).

Modern computing resources are powerful and in many ways our data is plentiful. There is no reason not to explore every model we can, no matter how ridiculous or counter-intuitive it might seem to be at first. From this we might discover something new (or refute something old), or even create a new early warning system for heart attacks. Just as correlation doesn’t imply causation, we should also remember that an element of causation (especially as a part of a highly complex and not fully understood system) doesn’t necessarily give us high correlation.

Remember, a statistical or predictive model is a tool. We can use it in many ways from the detection of a signal amidst the noise or to help us find areas where we can effect a change for better health. Tools can be constructed in many ways, and two that seem similar may have drastically different uses. Understanding how it was made and what it was made for is how we come to use a tool properly and ultimately derive maximum value.

Post a Comment

Do you have holes in your socks? Maybe you can help improve health care!

Sometimes I wear socks with holes in them. More often than I care to admit. Why? Because of a man named Eldon Richardson. Eldon was a Great Depression era electrician – and my maternal grandfather. I hope you have memories like mine, listening to the stories of overcoming hardship with grit and determination. People who lived through the Great Depression thought differently. They were practical – unbelievably practical. They wore socks with holes in them because they focused on more important and more practical things.

Maybe health care can learn a thing or two from holey socks. We could think differently and act practically. What if we, as leaders in health care, took bold measures of practicality: hard-nosed, Great Depression style practicality? Health and health care would advance more quickly than ever before.

Health care is NOT health (according to Lauren Taylor, author of The American Health care Paradox.) Since health is 60 percent socioeconomic/environmental/behavioral, and 20 percent genetic, but only 20 percent health care, it will obviously take more than doctors to improve our health. If the average American spends one hour per year with their primary doctor, but 240 hours per year in a store or online retail setting (Vaughn Kauffman, Principal, Health Industries Advisory Services, PwC), we could learn from retailers to help Americans develop behaviors that improve their health. Or consider that Americans check their smartphones over 100 times per day (Dr. Joseph Kvedar, Director, Center for Connected Health, Partners Health care) – what a great place to interject ideas to improve their health!

Sure, improving health and health care is a challenge, but an insignificant one compared to what grandpa Ed and millions more faced 80 years ago. If they did it, so can we. But it will take more than a new pair of socks! We’ll have to think differently and act practically!

How do we do that in the modern world? By using what’s available to us now to make progress just as our grandparents did – such as changing our thinking to be analytically driven – a proven approach across most other industries. Using our data to discover insight, predict the future based on the past, deploy insights we gain from data to drive proactive action, and monitor the results for the continual improvement of health care.

Post a Comment

EHR systems should enable the triple aim, not prevent it

A recent news headline read, “Bipartisan committee wants government-subsidized electronic records systems scrutinized for ‘information blocking.’” *

The question before the US Senate Appropriations Committee is whether taxpayer-funded EHR software solutions are now preventing the unrestricted exchange of medical records between health care organizations. If this is in fact the case, this undermines the Affordable Care Act and is a considerable waste of taxpayers’ money… and that money is considerable. The US$27B authorized as financial incentives for providers that implement EHR systems has driven broad adoption. However, one of the main goals behind broad EHR adoption was to expose the health care data that historically has been buried in the paper charts in filing cabinets.

Source: CDC/NCHS

Click to view a larger image.

The question before the Senate Appropriations Committee should be much broader. It's not just if the EHR systems are “preventing or inhibiting” unrestricted exchange of medical records. It's also whether these systems are enabling providers to readily store, access and mine the breadth and depth of all of the patient data generated within their own system. Think lab values, medical device data, and unstructured data as examples. Additionally, to be clear, we are not talking about securing access to the EHR data through a multi-week or multi-month consulting engagement by which the EHR vendor extracts the data from the provider’s EHR system and then delivers that data to the provider for their exploration. This is certainly feasible and may be a lucrative business for the EHR vendors, but it undermines the intent of the EHR systems. We know what e-Patient Dave would say, “Give me my darn data!”

The US health care system needs unencumbered real-time access to all of the data in the EHR systems – and this includes date/time stamps and facility/clinician signatures on all elements of a patient record. This will enable insights to be mined from the data under the premise that the data is freely accessible, and it enables data to be funneled back into the EHR and appended to individual patient records.

The promise of the triple aim relies in-part on the ability to leverage medical and other data to have a holistic picture of patients to identify what treatment approach is best for which patient at lowest possible cost. However, if data is inaccessible, realizing the goal of the triple aim will be in jeopardy.

If you work in health care, make sure that the data of your patients is easily accessible to you, your colleagues, and other providers in the care continuum of your patients for whatever purposes associated with improving care, decreasing costs and improving your patients’ experience.

If you work outside health care in the US, call the bipartisan appropriations committee. These are your tax dollars at risk of not delivering on the goal of improved care at a lower total cost for all of us.

Hopefully, the senate appropriates committee is coached on the fact that it is not just data “exchange.” It is about much more, including:

  • Ensuring that the data within the EHRs is of robust quality and can be easily exposed for the purposes of combining with data outside the EHR;
  • Enabling the mining of combined data sets, both structured and unstructured, to capture insights about patient outcomes;
  • Surfacing insights as to how clinical variability impacts patients; and
  • Facilitating the appending of data to patient records such that externally generated data (e.g. patient scores) can be embedded back into the patient workflow.

As the 2009 American Recovery and Reinvestment Act authorizes a net $27 billion in spending to support EHR adoption through 2017, lawmakers must fund the program each year. These are our tax dollars at-work or not-at-work due to data being potentially locked into the vaults of EHR software solutions. We need to see the EHR systems delivering on their true potential to enable improved care and lower costs, versus being the most sophisticated and expense filing cabinets in history.

* USF Health, Morsani College of Medicine, University of South Florida, “Senate Committee Calls for EHR Interoperability Investigation, Aug. 5, 2014.

Post a Comment

Health analytics - Rapidly gaining ground

Having long ago witnessed the power of analytics to improve performance, efficiency, cost and quality of online banking and investment services, I have been an advocate and evangelist of its power to do the same in health care. That’s one reason why I’m excited about the recent tidal wave of news, articles, blogs, announcements and public dialogue about the value of analytics in health care.

The recent Health Affairs article captured my attention partly because it was authored exclusively by industry clinicians and academicians, not by technology vendors. The article, Big Data In Health Care: Using Analytics To Identify And Manage High-Risk And High-Cost Patients^ acknowledges “unprecedented opportunities” to use big data to reduce the costs of health care. Perhaps most importantly, the authors identify six specific opportunities where analytics could and should be used to reduce cost:

  1. Identify early – and proactively manage – high-cost patients.
  2. Tailor interventions to reduce readmissions.
  3. Estimate risk to improve triage.
  4. Filter out noise to detect valid signals of decompensation.
  5. Predict severity of illness to prevent adverse events.
  6. Optimize treatment for diseases affecting multiple organ systems. 

It's encouraging to see health care executives acknowledging the need for analytical competency in their organizations; to see the US Congress acknowledging the need for data transparency and interoperability; to hear clinicians asking for analytically-derived decision support tools; to watch prestigious academic organizations expanding advanced degree programs in health informatics and biostatistics; and to hear health IT organizations demanding interoperability of data between EMR (electronic medical records) and their other systems.Health Analytics

I'm delighted by the article above, and by the early wins and impressive results generated in these six areas by friends and colleagues who are using advanced analytics to surface insights in their organizations across the globe. For example, our friends at the UNC School of Medicine are exploring the utility of big data for predicting exacerbation in diabetic patients, an innovation with the potential to simultaneously tackle the items in the list above: 1 (high-cost patients), 2 (tailor interventions) and 5 (predict to prevent).

Another example is the work being done at the Department of Orthopedic Surgery at Denmark's Lillebælt Hospital to use text analytics in automated clinical audits to detect and correct errors. The Lillebælt innovation demonstrates the efficiency gains made possible only through automation and the power to prevent patient injury at a scale which would otherwise be cost-prohibitive.

Perhaps the most exciting news of late is the announcement that Dignity Health is partnering with SAS to build a cloud-based big data analytics platform to enable value-based healthcare. In my opinion, this announcement represents a systemwide commitment to adopting health analytics as a core competency and puts Dignity Health on the road to realizing value in all six of the areas mentioned by Bates, and many more too numerous to list.

These are leading indicators that health care is modernizing and, I’m confident, will ultimately showcase the power of analytics to improve health care. The bottom line: Advanced health analytics is gaining ground in the industry and is picking up speed as more and more providers realize The Power to Know®.


^ Bates, D.W., Saria, S., Ohno-Machado, L., Shah, A., and Escobar, G. (July 2014). "Big Data In Health Care: Using Analytics To Identify And Manage High-Risk And High-Cost Patients," Health Affairs, 33, no.7.

Post a Comment

The value of big data – Part 3: “Big something”

I seem to write quite a few blogs downplaying the idea of big data; but to be quite honest, buzzwords like that tend to annoy me. They take attention away from the underlying problems and the more we use these terms, the less real meaning they seem to have. Saying “I have big data” seems to be a reflex, much like that embarrassing moment when the person behind the ticket counter tells you to enjoy the movie and you say “Thanks, you too”. You aren’t quite thinking about what you are saying but you mean well when you do it.

While the term big data is traditionally defined in a relative manner (allowing everyone to share in the joy of having it), I think it should be reserved for specific things. If, for instance, you have big data because you bought a lot more data than you can consume because you had money allocated for it, you have a “big spender.” If you have big data but you still don’t know how to analyze and utilize the data you’ve had for the last five years, then you have a “big dreamer.” And finally, if you have big data and you don’t have any idea how you got there, well, then you simply have a “big problem.”166539242_300x169_72dpi

Big data should be reserved for when otherwise carefully curated and managed sources of data and analytics have some kind of fundamental paradigm shift which changes their volume, variety, velocity and/or value. When your previously well-managed EMR analytics group is now able to use text mining and can look at 10 years of unstructured data, you have big data. When your data provider is now able to give you daily feeds of data instead of quarterly, then you have big data. When your genetic assay test drops from $400 to $4 and you can collect 100x as much information as you could before, you have big data.

This doesn’t mean that the people with a “big spender,” “big dreamer,” or “big problem” don’t have real issues that advances in analytics and technology may be able to resolve. But rather, the problems they need to solve are of a different nature. While it may seem like we all are swimming in the same big data pool, it’s important to keep in mind where you dove in from before you start trying to swim.

Post a Comment

Desiderata for enterprise health analytics in the 21st century

With apologies and acknowledgments to Dr. James Cimino, whose landmark paper on controlled medical terminologies still sets a challenging bar for vocabulary developers, standards organizations and vendors, I humbly propose a set of new desiderata for analytic systems in health care. These desiderata are, by definition, a list of highly desirable attributes that organizations should consider as a whole as they lay out their health analytics strategy – rather than adopting a piecemeal approach. They form the foundation for the collaboration that we at SAS have underway with Dignity Health.

The problem with today’s business intelligence infrastructure is that it was never conceived of as a true enterprise analytics platform, and definitely wasn’t architected for the big data needs of today or tomorrow. Many, in fact probably most, health care delivery organizations have allowed their analytic infrastructure to evolve in what a charitable person might describe as controlled anarchy. There has always been some level of demand for executive dashboards which led to IT investment in home grown, centralized, monolithic and relational database-centric enterprise data warehouses (EDWs) with one or more online analytical processing-type systems (such as Crystal Reports, Cognos or BusinessObjects) grafted on top to create the end-user-facing reports. Over time, departmental reporting systems have continued to grow up like weeds; data integration and data quality has become a mini-village that can never keep up with end-user demands. Something has to change. We’re working with Dignity Health to showcase what an advanced enterprise analytics architecture looks like and the transformations that it can enable.

Here are the desiderata that you should consider as you develop your analytic strategy:

  1. Define your analytic core platform and standardize. As organizations mature, they begin to standardize on the suite of enterprise applications they will use. This helps to control processes and reduces the complexity and ambiguity associated with having multiple systems of record. As with other enterprise applications such as electronic health record (EHR), you need to define those processes that require high levels of centralized control and those that can be configured locally. For EHR it’s important to have a single architecture for enterprise orders management, rules, results reporting and documentation engines, with support for local adaptability. Similarly with enterprise analytics, it’s important to have a single architecture for data integration, data quality, data storage, enterprise dashboards and report generation – as well as forecasting, predictive modelling, machine learning and optimization.
  2. Wrap your EDW with Hadoop. We’re entering an era where it’s easier to store everything than decide which data to throw away. Hadoop is an example of a technology that anticipates and enables this new era of data abundance. Use it as a staging area and ensure that your data quality and data transformation strategy incorporates and leverages Hadoop as a highly cost-effective storage and massively scalable query environment.
  3. Assume mobile and web as primary interaction. Although a small number of folks enjoy being glued to their computer, most don’t. Plan for this by making sure that your enterprise analytic tools are web-based and can be used from anywhere on any device that supports a web browser.
  4. Develop purpose-specific analytic marts. You don’t need all the data all the time. Pick the data you need for specific use cases and pull it into optimized analytic marts. Refresh the marts automatically based on rules, and apply any remaining transformation, cleansing and data augmentation routines on the way inbound to the mart.
  5. Leverage cloud for storage and Analytics as a Service (AaaS). Cloud-based analytic platforms will become more and more pervasive due to the price/performance advantage. There’s a reason that other industries are flocking to cloud-based enterprise storage and computing capacity, and the same dynamics hold true in health care. If your strategy doesn’t include a cloud-based component, you’re going to pay too much and be forced to innovate at a very slow pace.
  6. Adopt emerging standards for data integration. Analytic insights are moving away from purely retrospective dashboards and moving to real-time notification and alerting. Getting data to your analytic engine in a timely fashion becomes essential; therefore, look to emerging standards like FHIR, SPARQL and SMART as ways to provide two-way integration of your analytic engine with workflow-based applications.
  7. Establish a knowledge management architecture. Over time, your enterprise analytic architecture will become full of rules, reports, simulations and predictive models. These all need to be curated in a managed fashion to allow you to inventory and track the lifecycle of your knowledge assets. Ideally, you should be able to include other knowledge assets (such as order sets, rules and documentation templates), as well as your analytic assets.
  8. Support decentralization and democratization. Although you’ll want to control certain aspects of enterprise analytics through some form of Center of Excellence, it will be important for you to provide controlled access by regional and point-of-service teams to innovate at the periphery without having to provide change requests to a centralized team. Centralized models never can scale to meet demand, and local teams need to be given some guardrails within which to operate. Make sure to have this defined and managed tightly.
  9. Create a social layer. Analytics aren’t static reports any more. The expectation from your users is that they can interact, comment and share the insights that they develop and that are provided to them. Folks expect a two-way communication with report and predictive model creators and they don’t want to wait to schedule a meeting to discuss it. Overlay a portal layer that encourages and anticipates a community of learning.
  10. Make it easily actionable. If analytics are just static or drill-down reports or static risk scores, users will start to ignore them. Analytic insights should be thought of as decision support; and, the well-learned rules from EHRs apply to analytics too. Provide the insights in the context of my workflow, make it easy to understand what is being communicated, and make it easily actionable – allow users to take recommended actions rather than trying to guess what they might need to do next.

Thanks for reading, and please let me know what you think. Do these desiderata resonate with you? Are we missing anything essential? Or is this a reasonable baseline for organizations to get started?

We’ll be sure to update you as our collaboration with Dignity Health progresses.

Post a Comment

The value of big data – Part 2: The perceived ultimatum of big data

There almost seems to be a perception that the value of big data is only truly realized when we process all of it. The way we talk about big data, while generally optimistic, it has an almost ominous feel behind it. As though, if we fail to tackle all of big data in the next 12 months, Rumplestiltskin will come to take away our first born.

I would challenge anyone who says they can “tackle and resolve” their big data within 12 months that they do not have true big data. They may have a lot of data, and they may not have enough resources to consume it, but they don’t face a challenge on the scale of big data. At the same time, I would say that even those with the biggest of big data can start deriving meaningful value from that data quickly.

Successful delivery against big data requires two general objectives. The first (and usually most focused upon) is the enhanced/improved software and hardware infrastructure that can be used to churn the entire universe of data. This is often an expensive and potentially time-consuming task. If it wasn’t, you probably wouldn’t find yourself taking a second thought about the process and just go along your merry way.185084968_300x225_72dpi

The second aspect required (and one that can be potentially more readily tackled) is a focus on learning and understanding the constituent data. Regardless of the source of your big data, it can all be thought of as the amalgamation of many pieces of “small data.” These individual sources may be nuanced and complex, but when taken on one at a time, the challenge they pose is greatly diminished.

My first job out of school involved the analysis of a large national survey. The volume of data would almost be laughably small compared to sources we look at now, and yet the team dedicated to the analysis of the data was larger than the entire analytics groups for many companies I’ve talked to in the health care industry. Even with this pool of dedicated, highly trained and focused individuals, I don’t think any of us would say that we had mined the data to exhaustion. There was constantly more value we could extract, and likely there always will be.

Certainly we hit points of diminishing return, where new truly insightful discoveries in the data were few and far between. And I know for a fact that my colleagues and I would have loved the opportunity to bring this data together with other sources and see what we could find out – but we didn’t have that luxury. That said, I can’t remember a week going by where someone didn’t have a new idea, method or approach to tackle the data that didn’t bring out a little something new.

The point being that there is value still remaining in the data you can access and evaluate now. I don’t mean to downplay the value of what a big data-enabled platform can deliver, but rather to remind us all that even small, focused, incremental growth in understanding and utilization of the smaller building blocks of data will not only prepare us for reaping the long-term value of the big data conglomeration, but will likely provide meaningful insight along the way.

In Part 3 of this series, I’ll narrow down how to define big data, which is critical to understanding how best to utilize it.

Post a Comment

The value of big data – Part 1: Big board games

I collect board games, and probably, I collect too many of them. Each game is different and has its own charm and value. Some are fun for large groups, others work best when you play them one-on-one. Sometimes what draws me to a game is a great theme and sometimes it’s a novel mechanic. Regardless, there is something about all of them that makes me want them.
Board game and dice
Sadly, I only have so much time in a day and only so much of that can be devoted to playing games. My shelves are getting full with great games that I’ve never played, some even still have the shrink-wrap on them, never opened and likely never will be. To be honest, I have a problem. I have a wealth of entertainment value but no real way to appreciate it. When I buy games, I ignore the fact that I already have 20 games sitting ready to be played that haven’t been touched. And, among those games I have played, rarely have I truly mined their depths, rather I end up playing them once or twice and move on before getting all their worth.

Yes, I have a problem, and I call that problem “Big Board Games.” I have games of great variety and volume, acquired at high velocity and all having (at least to me) great value. All of you reading this probably think what a waste of money my games might be, or, at best what an interesting quirk for a statistician. But if I were to have told the same story in the context of data, we would think of this as a source of universal pride and envy.

It is a rare case that we stumble upon big data by accident or surprise. It is a deliberate process by which we desire and acquire new data sources. Some sources come from changes in technology allowing us to capture more detailed or more frequent data, perhaps a new type of genetic assay. Some sources come from vendors who sell EMR, claims or other rich sources of data. Sometimes we get lucky and find new ways to analyze previously inaccessible data; for instance text mining of EMR notes from a previously purchased source. But all of it we knowingly capture and all of it comes at a cost to us.

Granted, I’m a bit long-winded in getting to the point, but the point is a valid one. The acquisition of big data in and of itself isn’t a point of pride. Rather, the presence of big data indicates a data acquisition strategy that may be out of pace with the corresponding analytic support capability. We will all face times where we step into the realm of big data; this is a necessary product of growth and exploration. But those periods should be short-lived and carefully evaluated. We should be asking ourselves what was the factor that pushed us into a state of big data, and is the corresponding value worth the cost of acquisition – and more importantly, is it worth the cost of utilization?

When I store a new batch of board games on my shelves at home, I find I often have an introspective moment. As I rearrange and push aside old games I’m forced to ask myself whether these new games truly add value to my collection and enjoyment. Through acquisition of games, even great games, my ability to play isn’t increased, yet the potential pool for demand is growing constantly. Some games must be diminished, others forgotten almost completely; and yet, while I’m aware of all of this potential waste and lost value, there’s always one other great game that I just have to get next.

And so it goes with big data, as we’ll see in Part 2 and Part 3 of this series.

Post a Comment
  • About this blog

    Welcome to the SAS Health and Life Sciences blog. We explore how the health care ecosystem – providers, payers, pharmaceutical firms, regulators and consumers – can collaboratively use information and analytics to transform health quality, cost and outcomes.
  • Subscribe to this blog

    Enter your email address:

    Other subscription options

  • Archives