Data federation: an important piece of your data management strategy

1

Data comes in all shapes and sizes and from various sources and systems. We also know that companies that efficiently manage data have a distinct advantage in the market: clean, quality data yields better business process and precise analytics, which drives better, faster decision making.

So, how do they do it? How do companies manage the deluge of data?

There are many processes in place that help a company manage data, but one foundational piece to the data management puzzle is a process called data federation.

What is it and why should you care?

Data federation is a process that joins data from heterogeneous data sources into a single, combined unit. Once federated, the data is presented in a consistent format.

Take a retail chain, for example. They have transactions from online purchases and transactions from brick-and-mortar stores. Let’s say at the end of the year they want to create a single report of customers that made online and in-store purchases. To get this type of report, they need to integrate the disparate data systems.

Traditionally, the chain might create a report using a third-party operational data store (ODS) and ETL processes. But, if the chain uses a federation server instead, they can virtually marry the transactions and create a single report that shows a 360-degree view of their customers.

By accessing data virtually, the chain doesn’t have to move data, replicate it or retrieve it from tables to perform analysis. This virtualized environment, or layer, has several key advantages:

  • Data abstraction: The source details and format differences between the various data sources that make up a virtualized view of data are masked from the requesting application. As a result, the sources of the data that make up the view can be changed without the need for changes to any application that accesses the view.
  • Data security: Fine-grain security controls allow customers to assign role-based access, ensuring sensitive information is protected. By eliminating the need to move the data from its source systems, the chances that unsecured copies of replicated information will be accessed by unapproved users is also eliminated.
  • Data caching: The results of a virtualized view can be cached for faster, more efficient access. Servicing requests for BI or analytic access can severely impact the performance of the operational systems generating the information. By providing access to a cache of the result instead, operational systems perform more efficiently and users can access their information more quickly.

Data federation in today's organizations

Data abstraction

Problem: A financial services company has more than 100 analysts that need to access a particular instance of a database that contains data for analysis. Due to a change in regulations, the source of the information needs to be changed to a new instance of the database. How can the queries used by all of these analysts be quickly and easily changed without disrupting their work?

Solution: By having these analysts all access a materialized view that relates to the information needed, the back-end source of the view can be changed without the analysts needing to do anything different.

Data security

Problem: A pharmaceutical company has data spread across several systems that needs to be combined to deliver a report on drug trials. Combining the information into a data mart or data warehouse can result in the exposure of patient information – and the potential for a data breach and regulatory fines. How can this information be accessed in a secure manner without replicating it?

Solution: A materialized view can be queried where the result is a blend of the information needed from the various source systems. The information is only joined at the time a request is made, so no data is replicated onto another system like a mart or warehouse. Additionally, access to the view itself or to certain information within the view can be restricted to specific users so that only those users with permissions to access the blended information can get to it.

Data caching

Problem: A manufacturing company runs its business through three key operational systems. To track progress on key performance indicators, multiple data analysts must run analytic models on the data contained in these systems, but doing so slows down the systems and affects the ability of the manufacturer to get products out the door. How can the models be run without affecting the operational systems’ performance and thus the business?

Solution: Materialized views can be created that access the information needed for the models. The results of these views can be cached to a non-operational system. The cache can be refreshed  to balance the need for up-to-date information with the need to minimize the number of times queries are run against the operational systems. Incoming queries for the data for the analytical models will be rerouted to the cached result, rather than to the operational system, resulting in faster model performance and no impact to the business.

Is data federation right for you? Maybe so. For more information, check out the SAS Federation Server page and look for more discussions here on the Data Roundtable.

Tags
Share

About Author

Lane Whatley

Sr Communications Specialist

As a Senior Communications Specialist, Lane supports SAS R&D in planning, coordinating, and executing its internal communications objectives. In this role, she writes and edits organizational and R&D-related content for blogs, internal social media platforms and the corporate intranet. Prior to joining SAS, Lane worked at the Institute for Emerging Issues at NC State University in a communications and outreach capacity.

1 Comment

  1. Actually, data federation plays an important part during ETL development. Most of the data sources needed by an ETL system are distributed across different systems. A major ETL tool like, SAS Data Integration Studio, brings data with ease from such systems. A developer need not be aware of the differencs between the data sources because DI Studio takes care of it.

    There are more details on the site given here : http://www.divyeshdave.com/2012/06/what-is-data-integration/

Leave A Reply

Back to Top