MDM and Hadoop – Part 1

0

man evaluating MDM and HadoopHow many companies are using Hadoop as part of their master data management initiative? Come on, raise your hands! Well, maybe a better question is this: How many companies are using Hadoop for enterprise data?

From what I have seen, Hadoop is coming along quite nicely. However, it may not be the current technological “silver bullet.” I continually urge my clients to define the uses for Hadoop. I ask questions like this:

  • Is this project considered an enterprise-specific report that requires publication to external customers?
    • If it is, you may want to stay with structured, protected and guaranteed data.
  • Is this project analytical in nature and all the data is available in Hadoop?
    • Then Hadoop may be appropriate for this.
  • Should Hadoop be my store of data used for master data management?
    • That depends.

I've seen some companies use Hadoop as the staging area for consumption of unstructured master data from across the enterprise. I like this idea because it can be very fast and maybe a bit cheaper. However, our master data will still require:

  • Matching.
  • Merging.
  • De-duping.
  • Cleanup prior to enterprise consumption.
  • Enterprise identifiers for the master data in situations when multiple sources are brought together.

I truly believe that the Hadoop of today will not be the Hadoop of tomorrow. That's because this technology is moving fast. And that leads me to think that it will evolve much like our other database technology has matured. Some of the remaining questions I have are:

  • Will we saturate our big data environments, too?
  • Will governance and security become less important as we fish in the big lake?

The bottom line as I see it: Hadoop/big data can be part of our MDM initiatives. We need to use it wisely.


Download an e-book about the intersection of big data, data governance and MDM.

Share

About Author

Joyce Norris-Montanari

President of DBTech Solutions, Inc

Joyce Norris-Montanari, CBIP-CDMP, is president of DBTech Solutions, Inc. Joyce advises clients on all aspects of architectural integration, business intelligence and data management. Joyce advises clients about technology, including tools like ETL, profiling, database, quality and metadata. Joyce speaks frequently at data warehouse conferences and is a contributor to several trade publications. She co-authored Data Warehousing and E-Business (Wiley & Sons) with William H. Inmon and others. Joyce has managed and implemented data integrations, data warehouses and operational data stores in industries like education, pharmaceutical, restaurants, telecommunications, government, health care, financial, oil and gas, insurance, research and development and retail. She can be reached at jmontanari@earthlink.net.

Related Posts

Leave A Reply

Back to Top