I’ve faced the task of creating an integrated view of metadata across an enterprise, so I’m aware of the many hurdles it entails.
First, metadata integration and analysis require you to access all the metadata sources available to you. But metadata comes in many different formats, and vendors often store and maintain their metadata in proprietary file formats and repositories. How are you going to access and extract metadata from all those sources?
I’d like to share my experiences with third-party metadata collection and pass along some of the lessons I’ve learned. By making you aware of these challenges, I think you’ll be able to plan for them and deliver results on time. After explaining the challenges I’ve faced, I’ll introduce a technology that can help you accelerate metadata capture.
Metadata comes in many different formats
- Technical metadata describes data as it exists. It includes systems, databases, schemas, tables, columns, data types and other attributes. Technical metadata persists in a variety of formats, like data model, data dictionary, DBMS catalog and others.
- Business metadata describes data as it’s used in business processes. It can include objects such as terms, definitions, glossaries, dictionaries and taxonomies. Business metadata is often stored in different places. For example, spreadsheets, databases, emails, documents and in peoples’ heads.
- Process metadata is descriptive information about things that operate on your data. It may include information about data flows, how data is manipulated and how data is used in the organization. It has information about various ETL flows, reports consuming the data and analytic applications that transform the data into business information. It may also include business rules and decision processes.
Given these diverse types of metadata, you can see how challenging it can be to process it all. Each type of metadata presents unique challenges and opportunities.
Vendors store and maintain metadata in proprietary file formats and repositories
Software vendors store metadata in different file formats and repositories, and it's often proprietary to a particular vendor or tool. To complicate matters, vendors sometimes change the file structure between versions of the tools.
In some cases, you need special administrative access permissions to access all levels of metadata in the repositories. Some metadata can be very large. Making sense of the metadata can be difficult. Knowing what to select and what is useful could require domain expertise.
You can access and collect metadata from many different repositories
The good news is there are solutions to help you automate third-party metadata collection and analysis. SAS Metadata Bridges, for example, is available for most popular applications and can help you accelerate your project from the start.
SAS Metadata Bridges lets you connect to both metadata repositories and files, select what you want to extract and then make the information available. There are also bridges for general industry standards, and for acquiring metadata from files such as .CSV and XML.
I’ve used SAS to process metadata from these sources:
- Logical and physical data model in CA ERWin file.
- Business terms from a glossary in a SAP Sybase Powerdesigner file.
- Logic contained in SQL views in a DBMS.
- Technical metadata from Hadoop.
- Transformation logic about from ETL jobs created using tools like:
- IBM Datastage, Informatica PowerCenter, SAP Business Objects Data Integrator.
- Metadata about database stored procedures.
- Report metadata from popular business intelligence tools.
With metadata access established, I can use automated loaders to extract the metadata, load it into a repository, integrate the silos of metadata and deliver an integrated view of my metadata. This means business and IT users can search, perform analysis and do reporting – including lineage of key data flows used in my enterprise environment.