In my last post, I pointed out that we data quality practitioners want to apply data quality assertions to data instances to validate data in process, but the dynamic nature of data must be contrasted with our assumptions about how quality measures are applied to static records. In practice, the
Tag: data quality

Utilizing big data analytics is currently one of the most promising strategies for businesses to gain competitive advantage and ensure future growth. But as we saw with “small data analytics,” the success of “big data analytics” relies heavily on the quality of its source data. In fact, when combining “small” and “big” data
@philsimon on the need to recognize DQ differences.
It’s common at the start of a new year to create a long list of resolutions that we hope to achieve. The reality, of course, is by February those resolutions will likely be a distant memory. The key to making any resolution stick is to start small. Create one small
After working in the data quality industry for a number of years, I have realized that most practitioners tend to have a rather rigid perception of the assertions about the quality of data. Either a data set conforms to the set of data quality criteria and is deemed to be acceptable
This isn't Kansas anymore. Oz has become a sprawling, smart metropolis filled with sensor data. How do we make sense of, clean, govern and glean value from this big data so we can get Dorothy home? The answer is SAS Data Management. With the latest portfolio updates, customers will be
James Surowiecki wrote a book about The Wisdom of Crowds. Jeff Howe, who co-coined the term crowdsourcing, wrote a book about Why the Power of the Crowd Is Driving the Future of Business. In this blog series, I explore if it’s wise to crowdsource data improvement, and if the power of the crowd can
.@philsimon on the different folks you'll encounter in many large organizations.
Since now is the time when we reflect on the past year and make resolutions for next year, in this post I reflect on my Data Roundtable posts from the past year and use them to offer a few New Year’s data resolutions for you and your organization to consider in

I have participated in many discussions about master data management (MDM) being “just” about improving the quality of master data. Although master data management includes the discipline of data quality, it has a much broader scope. MDM introduces a new approach for managing data that isn't in scope of traditional data quality
As this is the week of Christmas, many, myself included, have Christmas songs stuck in their head. One of these jolly jingles is Santa Claus Is Coming To Town, which includes the line: “He knows if you’ve been bad or good, so be good for goodness sake!” The lyric is a
I have a rule – any conversion or upgrade will require the creation of a decommission plan. A decommission plan should include the following: A list and definition of each database, table and column (source and target). A list and definition of each of the current programs in use (you
The physical data model should represent exactly the way the tables and columns are designed in the in the database management system. I recommend keeping storage, partitioning, indexing and other physical characteristics in the data model if at all possible. This will make upkeep and comparison with the development, test
We've explored data provenance and the importance of data lineage before on the Data Roundtable (see here). If you are working in a regulated sector such as banking, insurance or healthcare, it is especially important right now and one of the essential elements of data quality that they look for
I have a question --- do we need a logical data model for a conversion? Here are my thoughts. I believe the answer is yes if the conversion has any of the following characteristics: The target application is created in-house. This application will more than likely be enhanced in the

La grandeza de sus datos probablemente no es la característica más importante. De hecho, puede que ni siquiera figure dentro de los aspectos relevantes por los cuales usted debería preocuparse. La calidad, la integración de los silos, la manipulación y la extracción de valor de los datos no estructurados siguen
In my previous post I explained that even if your organization does not have anyone with data steward as their official job title, data stewardship plays a crucial role in data governance and data quality. Let’s assume that this has inspired you to formally make data steward an official job title. How
To perform a successful data conversion, you have to know a number of things. In this series, we have uncovered the following about our conversion: Scope of the conversion Infrastructure for the conversion Source of the conversion Target for the conversion Management for the conversion Testing and Quality Assurance for
Here on the Data Roundtable we've discussed many topics such as root-cause analysis, continual improvement and defect prevention. Every organization must focus on these disciplines to create long-term value from data quality improvement instead of some fleeting benefit. Nowhere is this more important than the need for an appropriate education strategy, both in

The bigness of your data is likely not its most important characteristic. In fact, it probably doesn’t even rank among the Top 3 most important data issues you have to deal with. Data quality, the integration of data silos, and handling and extracting value from unstructured data are still the most
There are multiple types of data models, and some companies choose to NOT data model purchased software applications. I view this a bit differently. I think that any purchased application is part of our enterprise, thus it is part of our enterprise data model (or that concept is part of the
When you examine where most data quality defects arise from, you soon realise that your source applications are a prime culprit. You can argue that the sales team always enter incomplete address details, or the surgeons can't remember the correct patient type codes but in my experience the majority of
Data. Our industry really loves that word, making it seem like the whole world revolves around it. We certainly enjoy revolving a lot of words around it. We put words like master, big, and meta before it, and words like management, quality, and governance after it. This spins out disciplines
Don't be shy! Interviewing people BEFORE or AFTER a facilitated session just takes a bit of confidence, and good preparation. Building your confidence gets easier and easier the more you participate in interviews. The objective is to prepare and not waste anyone’s valuable time. I like to prepare notes based on
Many managers still perceive data quality projects to be a technical endeavour. Data being the domain of IT and therefore an initiative that can be mapped out on a traditional project plan with well-defined exit criteria and a clear statement of requirements. I used to believe this myth too. Coming
.@philsimon on the proliferation of "as a service" terms.
A few weeks back I noted that one of the objectives on an inventory process for reference data was data harmonization, which meant determining when two reference sets refer to the same conceptual domain and harmonizing the contents into a conformed standard domain. Conceptually it sounds relatively straightforward, but as
My previous post pondered the term disestimation, coined by Charles Seife in his book Proofiness: How You’re Being Fooled by the Numbers to warn us about understating or ignoring the uncertainties surrounding a number, mistaking it for a fact instead of the error-prone estimate that it really is. Sometimes this fact appears to
In my previous post Sisyphus didn’t need a fitness tracker, I recommended that you only collect, measure and analyze big data if it helps you make a better decision or change your actions. Unfortunately, it’s difficult to know ahead of time which data will meet that criteria. We often, therefore, collect, measure and analyze
In his pithy style, Seth Godin’s recent blog post Analytics without action said more in 32 words than most posts say in 320 words or most white papers say in 3200 words. (For those counting along, my opening sentence alone used 32 words). Godin’s blog post, in its entirety, stated: “Don’t measure