Data virtualization in complex environments


In a number of recent posts, we have discussed the issues that surround big data, largely looking at the need to access data from a number of sources of variant structure and format. From the perspective of the analytical environment, this has not only complicated the population of data warehouses in a timely and consistent manner, it also impacts the ability to ensure that the performance requirements of the downstream systems are met.

The barriers to success are a combination of the complexity of data extraction and transformation from numerous sources along with the timing and synchronization characteristics of data loading that expose inadvertent inconsistencies between analytical platforms and the original source systems. Data virtualization tools and techniques have matured to address these concerns, providing some key capabilities:

1) Federation: They enable federation of heterogeneous sources by mapping a standard or canonical data model to the access methods for the variety of sources comprising the federated model.

2) Caching: By managing accessed and aggregated data within a virtual (“cached”) environment, data virtualization reduces data latency, thereby increasing system performance.

3) Abstraction: Together, the federation and virtualization abstract the methods for access and combine them with the application of standards for data validation, cleansing and unification.

Data virtualization simplifies data access by end users and business applications through that abstraction, since they are not forced to be aware of source data locations, data integration, or application of business rules. While a straightforward approach to virtualization considers structural mappings of the data models by mapping a standard canonical model to the underlying data sources, virtualization can be expanded to enable a much richer set of access services by incorporating improved resolution of semantic access when coupled with a master data index. We will look at that in the next set of posts.


About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at

Leave A Reply

Back to Top