Big data seems like a daunting challenge because, as data management professionals, we have been taught by experts and learned from experience that we always have to dive deep into data in order to discover meaningful business insights, solve business problems, and support daily business operations.
However, it’s possible to avoid diving into data’s apparently bottomless depths. In fact, sometimes it’s okay to be shallow. There will be times when a deep data dive and detailed analysis will not be needed to solve a problem.
Let’s use a simple example. I had never watched the television show Lost. On Netflix I noticed that I could watch all 6 seasons (121 episodes), so I asked two of my closest friends, who had both watched the show while it was on-air, whether or not I should spend some of my free time getting lost in Lost.
One of my friends loved the show. My other friend hated the show. I know a lot about my two friends. I know what types of shows they typically like and dislike. I could have performed a detailed analysis by comparing their opinions about Lost with their opinions about other shows that we had all seen.
Instead, I checked the available data on Netflix. Over 8,000,000 people rated the show, giving Lost an average rating of 3.8 stars on a five star scale where 1 = Hated It, 2 = Didn’t Like It, 3 = Liked It, 4 = Really Liked It, and 5 = Loved It. Over 2,000 people also provided a written review to explain their rating. I knew nothing about any of these people that provided ratings and reviews on Netflix. Furthermore, I could not have performed a deep data dive and detailed analysis even if I wanted to since Netflix only provides the aggregated, general sentiment of a large group of unknown, unqualified strangers.
I decided it was okay to be shallow. Without seeking more data from other sources, or attempting a detailed text analysis of those 2,000 reviews, I decided to watch all 6 seasons of Lost and ending up giving it 4 stars.
Obviously, solving business problems with big data is more important than using it to choose what show to watch on Netflix. When leveraging big data analytics, however, it’s easy to find yourself lost in a deep ocean of data, as if you were stranded on a not-so-deserted island being chased by the Smoke Monster of poor data quality while being attacked by Mysterious Others, who are performing an identity resolution project, verifying your master data against a single version of the truth maintained by Jacob, who also checks your transaction data for any criminal or unethical activity, and if he finds any he has Richard tell Ben to have you sent to Room 23 on Hydra Island for awhile and then leave you wallowing in your own personal purgatory within the big data abyss.
Then again, perhaps it’s no coincidence that I wrote a blog post about it sometimes being okay to be shallow with big data analytics after spending 121 episodes trying to figure out the plot of Lost.