There is a time and a place for everything, but the time and place for data quality (DQ) in data integration (DI) efforts always seems like a thing everyone’s not quite sure about. I have previously blogged about the dangers of waiting until the middle of DI to consider, or become forced
Tag: data quality

“Garbage in, garbage out” is more than a catchphrase – it’s the unfortunate reality in many analytics initiatives. For most analytical applications, the biggest problem lies not in the predictive modeling, but in gathering and preparing data for analysis. When the analytics seems to be underperforming, the problem almost invariably

Bigger doesn’t always mean better. And that’s often the case with big data. Your data quality (DQ) problem – no denial, please – often only magnifies when you get bigger data sets. Having more unstructured data adds another level of complexity. The need for data quality on Hadoop is shown by user
.@philsimon on whether companies should apply some radical tactics to DG.
If your organization is large enough, it probably has multiple data-related initiatives going on at any given time. Perhaps a new data warehouse is planned, an ERP upgrade is imminent or a data quality project is underway. Whatever the initiative, it may raise questions around data governance – closely followed by discussions about the
In recent years, we practitioners in the data management world have been pretty quick to conflate “data governance” with “data quality” and “metadata.” Many tools marketed under "data governance" have emerged – yet when you inspect their capabilities, you see that in many ways these tools largely encompass data validation and data standardization. Unfortunately, we

After doing some recent research with IDC®, I got to thinking again about the reasons that organizations of all sizes in all industries are so slow at adopting analytics as part of their ‘business as usual’ operations. While I have no hard statistics on who is and who isn’t adopting
As consumers, the quality of our day is all too often governed by the outcome of computed events. My recent online shopping experience was a great example of how computed events can transpire to make (or break) a relaxing event. We had ordered grocery delivery with a new service provider. Our existing provider
(Otherwise known as Truncate – Load – Analyze – Repeat!) After you’ve prepared data for analysis and then analyzed it, how do you complete this process again? And again? And again? Most analytical applications are created to truncate the prior data, load new data for analysis, analyze it and repeat

The adoption of data analytics in organisations is widespread these days. Due to the lower costs of ownership and increased ease of deployment, there are realistically no barriers for any organisation wishing to exploit more from their data. This of course presents a challenge because the rate of data analytics adoption

In my last blog I detailed the four primary steps within the analytical lifecycle. The first and most time consuming step is data preparation. Many consider the term “Big Data” overhyped, and certainly overused. But there is no doubt that the explosion of new data is turning the insurance business
The other day, I was looking at an enterprise architecture diagram, and it actually showed a connection between the marketing database, the Hadoop server and the data warehouse. My response can be summed up in two ways. First, I was amazed! Second, I was very interested on how this customer uses
I've been in many bands over the years- from rock to jazz to orchestra - and each brings with it a different maturity, skill level, attitude, and challenge. Rock is arguably the easiest (and the most fun!) to play, as it involves the least members, lowest skill level, a goodly amount of drama, and the
One thing that always puzzled me when starting out with data quality management was just how difficult it was to obtain management buy-in. I've spoken before on this blog of the times I've witnessed considerable financial losses attributed to poor quality met with a shrug of management shoulders in terms
.@philsimon looks under the hood of 'analytics.'
The data lake is a great place to take a swim, but is the water clean? My colleague, Matthew Magne, compared big data to the Fire Swamp from The Princess Bride, and it can seem that foreboding. The questions we need to ask are: How was the data transformed and
One of the common traps I see data quality analysts falling into is measuring data quality in a uniform way across the entire data landscape. For example, you may have a transactional dataset that has hundreds of records with missing values or badly entered formats. In contrast, you may have
In The Princess Bride, one of my favorite movies, our hero Westley – in an attempt to save his love, Buttercup – has to navigate the Fire Swamp. There, Westley and Buttercup encounter fire spouts, quicksand and the dreaded rodents of unusual size (RUS's). Each time he has a response to the
Financial institutions are mired with large pools of historic data across multiple line of businesses and systems. However, much of the recent data is being produced externally and is isolated from the decision making and operational banking processes. The limitations of existing banking systems combined with inward-looking and confined data practices
Small data is akin to algebra; big data is like calculus.
In the movie Big, a 12-year-old boy, after being embarrassed in front of an older girl he was trying to impress by being told he was too short for a carnival ride, puts a coin into an antique arcade fortune teller machine called Zoltar Speaks, makes a wish to be big,
If you are looking for a way to fund your data quality objectives, consider looking in the closets of the organization. For example, look for issues that cost the company money that could have been avoided by better availability of data, better quality of the data or reliability of the

Data Management has been the foundational building block supporting major business analytics initiatives from day one. Not only is it highly relevant, it is absolutely critical to the success of all business analytics projects. Emerging big data platforms such as Hadoop and in-memory databases are disrupting traditional data architecture in
In this blog series, I am exploring if it’s wise to crowdsource data improvement, and if the power of the crowd can enable organizations to incorporate better enterprise data quality practices. In Part 1, I provided a high-level definition of crowdsourcing and explained that while it can be applied to a wide range of projects
.@philsimon on the reliability of social numbers.
Once in a while, people run into an issue with the data that doesn't really need to be fixed right to ensure success of a specific project. So, the data issues are put into production and forgotten. Everyone always says, “We will go back and correct this later.” But that
Regulatory compliance is a principal driver for data quality and data governance initiatives in many organisations right now, particularly in the banking sector. It is interesting to observe how many financial institutions immediately demand longer timeframes to help get their 'house in order' in preparation for each directive. To the
In this blog series, I am exploring if it’s wise to crowdsource data improvement, and if the power of the crowd can enable organizations to incorporate better enterprise data quality practices. In Part 1, I provided a high-level definition of crowdsourcing and explained that while it can be applied to a wide range of projects
There are companies that have no data quality initiative, and truly do believe that if they see no data problem. In effect, they say that if it does not interfere with day-to-day business, then there is no data quality problem. From what I have seen in my consulting experience, it usually
Over my last two posts, I suggested that our expectations for data quality morph over the duration of business processes, and it is only at a point that the process has completed that we can demand that all statically-applied data quality rules be observed. However, over the duration of the