In my post a few weeks back, I shared a sequence of steps for the hierarchical integration of aggregate transaction data across a community to be published by a single coordinator. Those steps were:
- Each organization extracts data from a variety of sources.
- Transactions are organized by individual.
- Sets of individual transactions are aggregated (e.g., sums are collected of transactions by individuals).
- The coordinator collects interim results from the community of organizations.
- The coordinator then sorts collected interim results by individual.
- The coordinator then generates the final result by finalizing the aggregation across the collected interim results.
- The final result is packaged for publication.
As I was reflecting on this sequence, it occurred to me that this process was somewhat familiar, but not from the perspective of the integration. Rather, it was the repetitiveness of the steps at different levels of an aggregation hierarchy, and it reminded me of some of the Hadoop MapReduce algorithms that I have recently been thinking about.
Let’s look at it a little more carefully:
- Step 1 loads data into an analytical environment
- Steps 2 and 3 aggregate data by individual at each “processing node”
- Step 4 communicates the interim results to a single coordinator
- Step 5 “flips” the data for collection across interim results
- Step 6 calculates the final totals
In other words, individual calculations are the Map phase, and the collections and final aggregations are the Reduce phase – a really big example of a MapReduce approach, even if it is not actually deployed specifically in a MapReduce environment.
So that makes me think: are there some general coordinated operational scenarios that can be modeled using a similar abstract programming model? If so, are there ways to embed certain services and governance to effect some degree of standardization across the community? And if so, what types of tools, techniques and oversight would be needed to make it actually work seamlessly across the administrative boundaries? I would be interested to hear from any of the readers if they have any similar thoughts…