One of my biggest problems with social media is its emphasis on simple numbers.
That might seem like an odd statement coming from guy who bleeds data and maintains an active social media presence.
Let me explain this apparent contradiction.
Once in a while, people run into an issue with the data that doesn't really need to be fixed right to ensure success of a specific project. So, the data issues are put into production and forgotten. Everyone always says, “We will go back and correct this later.” But that never happens. At least not with anyone I know. If you have had the luxury of going back and making corrections after something is in production, please let me know so I can change my attitude on this issue!
The assumption is that if it is in production, and nothing broke, then all is OK! In the first blog of this three-part series, I stated that for a data quality initiative consider checking for completeness, accuracy, and integrity of the data. These, to me, are the most important metrics about data quality to collect and monitor. Read More
Regulatory compliance is a principal driver for data quality and data governance initiatives in many organisations right now, particularly in the banking sector.
It is interesting to observe how many financial institutions immediately demand longer timeframes to help get their 'house in order' in preparation for each directive.
To the outsider, it may appear disturbing why modern, data-driven organisations, would struggle to supply accurate data or a robust working knowledge of their internal processes.
The reality, of course, is that many financial institutions are fighting a battle with data, every day. They are trying to cope with internal and external demands for faster, smarter services, while still maintaining an often outdated, silo-based, IT infrastructure. Read More
In this blog series, I am exploring if it’s wise to crowdsource data improvement, and if the power of the crowd can enable organizations to incorporate better enterprise data quality practices.
In Part 1, I provided a high-level definition of crowdsourcing and explained that while it can be applied to a wide range of projects and activities, applying crowdsourcing to data improvement involves three aspects: type of data, kind of crowd, and form of improvement. Part 2 focuses on type of data and kind of crowd. Read More
There are companies that have no data quality initiative, and truly do believe that if they see no data problem. In effect, they say that if it does not interfere with day-to-day business, then there is no data quality problem. From what I have seen in my consulting experience, it usually doesn't take long for the data quality issues to rise to surface, and cause undue chaos and rework.
Over my last two posts, I suggested that our expectations for data quality morph over the duration of business processes, and it is only at a point that the process has completed that we can demand that all statically-applied data quality rules be observed. However, over the duration of the process, there are situations in which the rules might be violated yet the dynamic nature of the data allows for records to temporarily remain in a state that ultimately might be deemed invalid. Read More
One of the significant problems data quality leaders face is changing people's perception of data quality. For example, one common misconception is that data quality represents just another data processing activity.
If you have a data warehouse, you will almost certainly have some form of data processing in the form of 'data cleanup' activity. This cleanup work may involve transforming rogue values and preparing the data for upload into the live warehouse and reporting environment.
In a CRM migration, your data processing may include a stage to deduplicate customer records prior to go-live.
It's easy to see how people view these activities as data quality because they are improving the quality of the final information product. Read More
In my last post, I pointed out that we data quality practitioners want to apply data quality assertions to data instances to validate data in process, but the dynamic nature of data must be contrasted with our assumptions about how quality measures are applied to static records. In practice, the data used in conjunction with a business process may not be “fully-formed” until the business process fully completes. This means that records may exist within the system that would be designated as invalid after the fact, but from a practical standpoint remain valid at different points in time until the process completes. Read More
Utilizing big data analytics is currently one of the most promising strategies for businesses to gain competitive advantage and ensure future growth. But as we saw with “small data analytics,” the success of “big data analytics” relies heavily on the quality of its source data. In fact, when combining “small” and “big” data for analysis, neither should lack quality. That raises this question: how can companies assure the quality of big data? Read More