In the movie Big, a 12-year-old boy, after being embarrassed in front of an older girl he was trying to impress by being told he was too short for a carnival ride, puts a coin into an antique arcade fortune teller machine called Zoltar Speaks, makes a wish to be big, and awakes the next morning transformed into a 30-year-old man.
Traditional data management is making a wish to integrate big data efforts into existing processes and programs in order to transform the organization into a 21st century data-driven enterprise. Since Zoltar Speaks doesn’t really grant wishes, especially big data wishes, let’s briefly look at a few aspects of big data integration.
Master Data Management (MDM)
The most obvious big data integration is what’s referred to as Social MDM, which is the integration of social media data into MDM implementations. Its primary business case is to enable the organization to perform social media marketing, such as analyzing customer sentiment and contacting customers with promotional offers.
There are already many things that complicate MDM and while Social MDM may be on many big data wish lists, I foretell three big challenges:
- Identity – How do you identify that someone on Twitter or Facebook is your customer? Most social media profiles are sparsely populated with identifying attributes, many times limited to only a name. Data quality professionals know that matching on name only creates a lot of false positives. For example, @JimHarris on Twitter is not me, and neither are most of the many Twitter users named Jim Harris.
- Relevancy – Assuming you resolved the identity challenge (and asking your customers to provide you with their social media profiles might be the only reliable way), the next challenge is just how relevant is your customers’ social media data? The vast majority of your customers’ activity on Twitter and Facebook has absolutely nothing to do with their relationship to your company or your products and services.
Perspectives about data quality have always been relative and variable. Data quality is relative to a particular business use and can vary by user with one user’s good data not being good enough for another user. With external data, bad data is often as good as you can get, but big data has triggered an explosion in the volume and variety of external data. The quality of much of this data is highly suspect but crowdsourcing can help assess and improve the quality of external data.
Big data is also forcing organizations to realize that not all data quality issues should be corrected, meaning data quality improvement efforts must be properly prioritized. Big data has also created new use cases, especially aggregate analytics where sometimes bigger, lower quality, data is better. (To learn more about how companies are dealing with big data quality challenges, get this TDWI e-book.)
On its own, data governance is disruptive. However, big data has the potential to disrupt data governance. Established principles, policies, and procedures that have proven effective for governing other areas of data management might not be applicable, or fully enforceable, as big data is integrated into more applications.
This doesn’t mean that big data will force you to start over with data governance. Just don’t force big data to be governed by the same rules by default. Take it on a case-by-case basis and amend existing, or create new, data governance policies as necessary.
You got any big ideas?
If you have an experience or perspective to share about how to integrate big data efforts into existing data quality, MDM, and data governance processes and programs, then please post a comment below.