Big data, over the last few years, has evolved very quickly to meet our enterprise needs. We started with a place to store near-real-time data for consumption by the enterprise or other applications. This data was in its rawest form, and conforming took place in a front-end type tool that worked within our big data platform.
We thought, gosh what a great (and cheaper) place to persist our data over time. But, there were very few governance and data management principles applied to this data store. Especially not the kind that we were used to dealing with in the enterprise. That, in itself, made some of us very leery (and nervous) about using this new technology.
That said, we included Hadoop/big data in our solutions where it made sense. Meaning, as a data store to land/stage data for the data warehouse, or pre-analytics. We also considered it for the data consumption layer, knowing that the business rules had to be recreated for each consumer via the query/reporting tools. We also understood that some very involved management and use of data would need to take place. We used Hadoop as a platform to profile our data prior to loading it into a data warehouse or operational system, and it worked very nicely.
Then our implementation teams found a few other things that made us nervous – like, you can’t update a record. My immediate reaction was “what?”. After settling down from my initial shock, I decided I was good with that – for now. Our team can just change the design to add date to find the most current record (i.e., MAX Effective_Date). At this point, we may have been asking for a characteristic that was difficult to achieve. (But with the way the technology is maturing, it may be available before this blog ever gets posted!).
As we think about that question, consider that MDM has some of these characteristics:
- The ability to understand and use the authoring systems of master data.
- If you have three customer systems, you may want different data elements from each system.
- Merging and de-duplication are pretty standard. So we need a place to do the deduping and a place to land the golden records.
- The ability to validate and monitor the master data (ongoing and at intervals).
- The ability to store the data or the metadata to create a complete master data record.
- Ease of use (Ask: does it meet your business needs?).
- Flexibility of change (Ask: Does it meet your future business needs?).
- An easy way to call the master data or receive the master data for enterprise processes/systems.
- Security (Ask: Is there enough security to protect our data?). This requirement is coming along nicely, but still makes me nervous with some corporate data (i.e., financials).
As the technology changes and incorporates more of those data governance and other data disciplines that have made our organizations successful, my opinion will change too. I believe that Hadoop/big data is ready to start taking on MDM and other enterprise-critical applications.
I guess we have to wait and see how things unfold from here.Download an e-book: An Early Adopter's Guide to Hadoop