The specifics of analytics in data quality

0

We just published Gerhard Svolba’s Data Quality for Analytics Using SAS. When I first heard about it, I thought we’d have a tome covering such topics as standardizing data, cleaning it up, removing duplicates, and so on.

However, as Gerhard says in his Introduction, “There are many aspects of data that are specific to analytics.” I realized data quality for analytics is a completely different animal than data quality alone. So I asked him, “What are we talking about when we talk about data quality for analytics?”

Gerhard Svolba celebrates his new book

Here’s what he had to say:

  • Analytics has additional requirements on data quality. Analytics poses high requirements on data availability; historic data are needed to train a model. The ability to provide and update the data over time consistently to refresh the analysis or to score new observations and the ability to provide the data on the appropriate aggregation level are also important requirements. Data “quantity” is important for analytics as many methods need a minimum amount of data for meaningful and significant results. Otherwise potential true effects may remain undetected.
  • Analytics contributes methods for better data quality. Analytical methods (and thus many SAS Analytics tools) offer many possibilities to profile, improve, assess, and simulate around data quality. Detecting missing values is often simple. However, analyzing the pattern of missing values across variables gives insight, whether or not the gaps occur systematically. In data correction, analytics provides individual validations limits for each analysis subject, thus allowing a more effective search for outliers. Simulation studies show the consequences of poor data quality on model quality.
  • Simulation studies show the consequences of poor data quality on model quality. Simulation studies can help to determine the worth of additional observations and events and allow analysts to assess the consequences of missing values and biases in the data. Thus the analyst can decide whether or not to use certain data for analysis or “walk back” and perform additional data cleaning and quality improvement actions before the analysis begins.

Do you need to learn how you can use SAS to perform advanced profiling of data quality status? Gerhard Svolba’s book can help. Order your copy or read a free chapter today.

Share

About Author


Comments are closed.

Back to Top