Machine-to-Machine Data and the Expectation of Data Quality


The anticipation of massive volumes of data streaming from automated sources has the big data community drooling at the opportunities for analysis. For example, as the energy utilities industry continues to deploy home-based smart meters in concert with additional sensors peppered across the grids, there will be a transition from the expected once-a-month manual monitoring to massive data sets automatically generated and communicated across the system. This will help in improving energy distribution, managing the component lifecycle across the network, reducing costs and ultimately predicting and reacting to grid events such as surges and outages.

Clearly, the predictive capabilities associated with these big data analytics applications are dependent on trustworthy data. In fact, you might say that the expectations nicely map to the traditional dimensions of data quality: data values are expected to be complete, accurate, timely, current and consistent, among others. So one should take some comfort in knowing that this machine-to-machine data is not only automatically generated and transmitted, its systemic isolation allows it to remain unsullied by human hands – those same hands that are so often the source of data issues.

That last statement raises an interesting question, though. If we expect that the data is always going to be correct, then we don’t need to monitor the data streams for validity. Of course, you say, one or more of the sensors might malfunction and begin to generate bad data, so we will need data quality measures. At the same time, the data streams will need to be monitored for behaviors that are outside expectations.

But what’s the difference? How do we know when a data value represents aberrant behavior that needs to be addressed vs. incorrect data values generated by a failing device?


About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at

Leave A Reply

Back to Top