Balancing performance measures for data accessibility using data federation


How do you balance the costs and benefits of copying data? Seems like a simple (or perhaps simplistic) question, but the answer actually can provide the perspective for performance measures that influence a system design. For example, an objective for creating a data mart to support many individual queries each day as part of a workflow process requiring up-to-date data is very different from creating a data mart for generating the reports for the previous day’s transactions.

In the first case, one performance measure might be the average query response time amortized over the total number of queries each day. A second measure of usability is data currency, which measures how up-to-date the delivered results are. In this case, your architecture would optimize for those two variables – the most acceptable response time and the optimal degree of data currency.

In the second case, one performance measure might be the time it takes to generate the report and deliver it in a timely manner. The currency of the data is somewhat secondary, since the reports aggregate data from the previous day, so as long as the data is current as of the end of the previous day’s transactions, the business requirements can be met.

But what happens if you need to support both of these business usage scenarios? A system that optimizes for data currency would benefit from not copying the data, since that creates a pressure for continuously updating the copy every time the source changes. On the other hand, optimizing for report generation would benefit from a copy, especially if the need for data synchronization is low, since you don’t want to have to query multiple source systems repeatedly in order to collect the data needed to generate the report.

This is a common occurrence, and is ably addressed by using techniques such as data federation to essentially create a hybrid approach. Data federation and virtualization abstract the methods of data access by layering a logical model on top of the sources, caching some of the data and monitoring source systems to stream updates to cached versions when necessary. These methods combine user knowledge with analysis of the usage patterns to effectively obfuscate the details that underlie those capabilities. These methods ultimately can leave much of the data in its original source in a way that balances the satisfaction of the combined performance expectations for currency, synchronization and data access speed.


About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at

Leave A Reply

Back to Top