At lunch with a nurse friend of mine recently, I happened to drop the term big data. No sooner had the words left my mouth when I asked her the question, "Have you ever heard of big data?"
As I expected, she responded in the negative. I proceeded to give her a short, jargon-free definition of the term rife with examples of the social media sites she frequents. Of course there's more to it, but the collection of photos, tweets, blog posts, Facebook likes, LinkedIn articles, YouTube videos and the like add up to a boatload of information.
I was thinking about this in the context of this month's theme: master data management (MDM). In particular, I stewed over the question "What – if any – is the relationship between big data and MDM?"
Although several of my books touch upon MDM, I don't consider myself a true expert on the matter. (David Loshin could dance circles around me here.) Still, I know a fair amount.
A brief history of MDM
At a high level, organizations have historically purchased and deployed MDM technologies around critical data sets. (Some have called this MDM 1.0, although I'm not in love with the term.) Employees, products, vendors and customers quickly come to mind. Sure, data quality is important across the board, but duplicate, erroneous or missing data in these categories can quickly cause massive problems. Consider the following:
- Employees and vendors paid multiple checks.
- Different types of security breaches.
- Customers who are billed twice – or not at all.
- Product descriptions and codes that vary wildly in dependent systems.
I could go on, but you get my point. As such, the need to tightly control and manage this highly structured data is particularly pronounced. And this is especially true within large, mature organizations that cannot simply start from scratch. It's for this reason that it's called master data management.
Where are we now?
This begs the questions: Can an organization manage big data? And is this possible via an MDM application?
Before answering these questions, it's essential to discuss two key characteristics of big data. As I write in Too Big to Ignore, big data is unlike its small counterpart in many ways. For starters, it typically emanates from – and is stored in places – outside of the purview of an organization's IT department. Scrape tweets, blog posts and photos all you like, but an organization doesn't control it per se. Generally speaking, it's tough to manage what you don't control in the first place.
Moreover, big data is largely unstructured. It lends itself to unique identifiers (read: employee, customer, product and vendor ID numbers). Yes, each tweet contains its own unique number and metadata, although the content of tweets can be identical. A system administrator can prevent clerks from adding new employee, vendor and customer codes. He or she cannot, however, prevent a disgruntled current or ex-employee from venting on social networks.
Simon Says
Returning to the initial query, what's the relationship between big data and MDM? In short, there really isn't a relationship – but that may change. In part two of this post, I'll look at potential integration between structured and unstructured data – a term that some have called MDM 2.0.
Feedback
What say you?
Get the TDWI checklist: Seven Tips for Unified Master Data Management