Data federation is software that allows an organization to create a "virtual database" from multiple sources of like information. For example, customer data may be stored in multiple applications in the enterprise. This software allows us to "cherry pick" the BEST parts of customer from each data source, integrate it and present to the business user. Much like a database view, this software allows us to create a layer of metadata that hides the complexity of connection and querying those application systems. Some software even allows you to physicalize the "virtual database." Meaning storing it locally on disk.
Data federation makes sense to me when we want to create something very fast. It also works well for prototyping, but when the data is of bad quality or does not lend itself to integration, physical data stores will be required. Every article I have read does not talk about data quality in the same sentence as data federation or virtual databases. So beware of how it gets used in your organization. This software is not a silver bullet, BUT if you have good quality data that resides in multiple application systems, then use the software to create a "virtual single source" of customer or product data.
Creating integrated historical data, as a data warehouse, with this software may not be so easy. Some questions I would ask are:
- Where does the historical data come from? Source systems? I can’t very well make this up on the fly.
- If the historical data was in the source systems to begin with – why are you building a data warehouse?
1 Comment
I think you should take a look to recent Data Virtualization best practices. Nobody in the DV world is saying anymore you should access all data in real-time and that complex data quality transformations should be done on the fly from source data. But current DV tools can integrate all these steps and have much to offer in complex and hard BI scenarios. The recent Data Virtualization for Business Intelligence book by Ray Van der Lans describes a lot of patterns where DV tools complement existing BI infrastructure to increase agility