Data quality to "DI" for

There is a time and a place for everything, but the time and place for data quality (DQ) in data integration (DI) efforts always seems like a thing everyone’s not quite sure about. I have previously blogged about the dangers of waiting until the middle of DI to consider, or become forced [...]

Post a Comment

Data integration: Comparing traditional sources and big data

While not on the same level of Rush, I do fancy myself a fan of The Who. I'm particularly fond of the band's 1973 epic, Quadrophenia. From the track "5:15": Inside outside, leave me alone Inside outside, nowhere is home Inside outside, where have I been? The inside-outside distinction is rather apropos [...]

Post a Comment

Data integration considerations for the data lake: Standardization and transformation

In my last post, I noted that the flexibility provided by the concept of the schema-on-read paradigm that is typical of a data lake had to be tempered with the use of a metadata repository so that anyone wanting to use that data could figure out what was really in [...]

Post a Comment

Big data integration: The case against an "all-in" approach

I've spent a great deal of time in my consulting career railing against multiple systems of record, data silos and disparate versions of the truth. In the mid-1990s, I realized that Excel could only do so much. To quickly identify and ultimately ameliorate thorny data issues, I had to up [...]

Post a Comment

Data integration considerations for the data lake: The need for metadata

A few of our clients are exploring the use of a data lake as both a landing pad and a repository for collection of enterprise data sets. However, after probing a little bit about what they expected to do with this data lake, I found that the simple use of [...]

Post a Comment

Data preparation: Managing data for analytics

What data do you prepare to analysis?  Where does that data come from in the enterprise?  Hopefully, by answering these questions, we can understand what is required to supply data for an analytics process. Data preparation is the act of cleansing (or not) the data required to meet the business [...]

Post a Comment

SAS Data Loader for Hadoop helps your data heroes navigate the fire swamp of big data

In The Princess Bride, one of my favorite movies, our hero Westley – in an attempt to save his love, Buttercup – has to navigate the Fire Swamp. There, Westley and Buttercup encounter fire spouts, quicksand and the dreaded rodents of unusual size (RUS's). Each time he has a response to the [...]

Post a Comment

Non-geeks want to know: will Hadoop mess up my data warehouse ecosystem?

Hadoop recently turned eight years old, but it was only 3-4 years ago that Hadoop really started gaining traction. It had many of us “older” BI/DW folks scratching our heads wondering what Hadoop was up to and if our tried-and-true enterprise data warehouse (EDW) ecosystems were in jeopardy. You didn't [...]

Post a Comment

What is reference data harmonization?

A few weeks back I noted that one of the objectives on an inventory process for reference data was data harmonization, which meant determining when two reference sets refer to the same conceptual domain and harmonizing the contents into a conformed standard domain. Conceptually it sounds relatively straightforward, but as [...]

Post a Comment