At this stage our organization has already defined the business objectives for Data Governance programme (step 0) and started to manage business terms, as described in step 1.
Step 2: tracking data flow in organization
Data Governance is not only related to understanding the data – it also focuses on the awareness of how the organization processes the data and information, the systems in which it functions and the transformations it undergoes. Each organization has the foundation for such Data Governance perspective. In the simplest form, most frequently it is metadata sets dispersed in the organization, accompanying the IT systems, data warehouses, report repositories or visualizations and reports. Not always is the scope of information collected in the above metadata broad enough to cover both technical and business description of the data, but it makes a very good start that is worth using.
Consolidation of the technical aspects of data flow in the organization is possible thanks to SAS Lineage.
Based on the mechanisms of collecting metadata and its extraction from a wide scope of available sources, one common repository is created, which contains information describing data flow among computer systems. It is possible to obtain metadata i. a. from:
- data modelling software (e. g. Erwin or Rational Rose)
- databases and Big Data repositories
- ETL tools, together with information on data transformations performed
- analytical and reporting systems, where information is visualized and made available to the users (e. g. world of SAP Business OBJECTS)
- SAS metadata server shared by all elements of the SAS Analytical Platform.
Regardless of the variety of tools used by the organization, it is possible to track the way the data flows in the IT infrastructure. Next, it is possible to connect it with the already defined business terms and definitions (step 1 above).
By creating a link between the world of abstract substantive definitions and the world of technical data representation, we get the chance to track the data, analyze its origin and find its real source in the organization. These actions are simple, do not require big projects and most importantly, they are an up-to-date and automatically-created illustration of how the information is used by the organization.
By executing this step the organization gets the following benefits:
- ability to track down data origin e. g. in the reports, which, to a significant extent, translates into its use as well as the trust of the recipients and more and more often is required by the regulations and principles for entity operation, e.g. on financial markets
- ability to assess the effect of changes in computer systems on the information flow, required changes in report systems and decision-making support systems – the impact analysis
- minimization of the risk of misinterpretation or misuse of data thanks to the detailed history of its origin in connection with business meaning and designated purpose of data.
Read the following entries, in which the next steps related to starting and continuing Data Governance initiative in an organization are described. I encourage you to make yourself familiar with SAS Data Governance products.