I seem to write quite a few blogs downplaying the idea of big data; but to be quite honest, buzzwords like that tend to annoy me. They take attention away from the underlying problems and the more we use these terms, the less real meaning they seem to have. Saying “I have big data” seems to be a reflex, much like that embarrassing moment when the person behind the ticket counter tells you to enjoy the movie and you say “Thanks, you too”. You aren’t quite thinking about what you are saying but you mean well when you do it.
While the term big data is traditionally defined in a relative manner (allowing everyone to share in the joy of having it), I think it should be reserved for specific things. If, for instance, you have big data because you bought a lot more data than you can consume because you had money allocated for it, you have a “big spender.” If you have big data but you still don’t know how to analyze and utilize the data you’ve had for the last five years, then you have a “big dreamer.” And finally, if you have big data and you don’t have any idea how you got there, well, then you simply have a “big problem.”
Big data should be reserved for when otherwise carefully curated and managed sources of data and analytics have some kind of fundamental paradigm shift which changes their volume, variety, velocity and/or value. When your previously well-managed EMR analytics group is now able to use text mining and can look at 10 years of unstructured data, you have big data. When your data provider is now able to give you daily feeds of data instead of quarterly, then you have big data. When your genetic assay test drops from $400 to $4 and you can collect 100x as much information as you could before, you have big data.
This doesn’t mean that the people with a “big spender,” “big dreamer,” or “big problem” don’t have real issues that advances in analytics and technology may be able to resolve. But rather, the problems they need to solve are of a different nature. While it may seem like we all are swimming in the same big data pool, it’s important to keep in mind where you dove in from before you start trying to swim.