In last week’s post, I shared a (somewhat sanitized) scenario in which (stay with me now) data sets collected by a coordinating organization from a community of companies, each of which providing a consolidated extract, needed to be merged to deliver a newly created data asset representing the aggregate functions applied across the consolidated extracts. And while the example was provided in the context of a government agency collecting data for reporting purposes, this scenario is not uncommon. In any environment that involves a single coordinator reporting detail statistics based on data collected from numerous agents or franchisees, we see the same model, which can be summarized in these stages:
- Each organization extracts data from a variety of sources.
- Transactions are organized by individual.
- Sets of individual transactions are aggregated (e.g., sums are collected of transactions by individuals).
- The coordinator collects interim results from the community of organizations.
- The coordinator then sorts collected interim results by individual.
- The coordinator generates the final result by finalizing the aggregation across the collected interim results.
- The final result is packaged for publication.
The successful completion of this process would result in a trustworthy index of totaled transactions in which there is one and only entry for any individual. In essence, the representation of the individual becomes the key of the index. However, that puts the onus of consolidation and aggregation on the coordinator, who in some ways is left holding the bag. That is because the coordinator either must demand that each of the data providers standardize their representation of the individual, or must have intimate knowledge of the various representations provided by each organization. While this is possible, it seems unlikely for any coordinator to be ready to turn these data products around without some additional disciplines and tools in place. More to follow…