The onus of data consolidation and aggregation

0

In last week’s post, I shared a (somewhat sanitized) scenario in which (stay with me now) data sets collected by a coordinating organization from a community of companies, each of which providing a consolidated extract, needed to be merged to deliver a newly created data asset representing the aggregate functions applied across the consolidated extracts. And while the example was provided in the context of a government agency collecting data for reporting purposes, this scenario is not uncommon. In any environment that involves a single coordinator reporting detail statistics based on data collected from numerous agents or franchisees, we see the same model, which can be summarized in these stages:

  1. Each organization extracts data from a variety of sources.
  2. Transactions are organized by individual.
  3. Sets of individual transactions are aggregated (e.g., sums are collected of transactions by individuals).
  4. The coordinator collects interim results from the community of organizations.
  5. The coordinator then sorts collected interim results by individual.
  6. The coordinator generates the final result by finalizing the aggregation across the collected interim results.
  7. The final result is packaged for publication.

The successful completion of this process would result in a trustworthy index of totaled transactions in which there is one and only entry for any individual. In essence, the representation of the individual becomes the key of the index. However, that puts the onus of consolidation and aggregation on the coordinator, who in some ways is left holding the bag. That is because the coordinator either must demand that each of the data providers standardize their representation of the individual, or must have intimate knowledge of the various representations provided by each organization. While this is possible, it seems unlikely for any coordinator to be ready to turn these data products around without some additional disciplines and tools in place. More to follow…

Share

About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at b-eye-network.com and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at mdmbook.com . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at loshin@knowledge-integrity.com.

Leave A Reply

Back to Top