How many companies are using Hadoop as part of their master data management initiative? Come on, raise your hands! Well, maybe a better question is this: How many companies are using Hadoop for enterprise data?
From what I have seen, Hadoop is coming along quite nicely. However, it may not be the current technological “silver bullet.” I continually urge my clients to define the uses for Hadoop. I ask questions like this:
- Is this project considered an enterprise-specific report that requires publication to external customers?
- If it is, you may want to stay with structured, protected and guaranteed data.
- Is this project analytical in nature and all the data is available in Hadoop?
- Then Hadoop may be appropriate for this.
- Should Hadoop be my store of data used for master data management?
- That depends.
I've seen some companies use Hadoop as the staging area for consumption of unstructured master data from across the enterprise. I like this idea because it can be very fast and maybe a bit cheaper. However, our master data will still require:
- Matching.
- Merging.
- De-duping.
- Cleanup prior to enterprise consumption.
- Enterprise identifiers for the master data in situations when multiple sources are brought together.
I truly believe that the Hadoop of today will not be the Hadoop of tomorrow. That's because this technology is moving fast. And that leads me to think that it will evolve much like our other database technology has matured. Some of the remaining questions I have are:
- Will we saturate our big data environments, too?
- Will governance and security become less important as we fish in the big lake?
The bottom line as I see it: Hadoop/big data can be part of our MDM initiatives. We need to use it wisely.
Download an e-book about the intersection of big data, data governance and MDM.