Data quality monitoring - step 3.


In the previous posts of the series, there were described two first steps - data defining, and then their profiling. After precise data collecting and setting business rules, we can move on to their creating.

Step 3. Rule creation

After precise data examination and definition of business rules, we can move on to actual rule creation. During rule creation, it is recommended to step away from physical tables, columns and records and to think about rules in a more global manner. At the very beginning, it is good to define so-called abstract attributes which we will want to monitor and from which we would additionally like to collect retail values in the database. For example, we want to verify whether the value of indicator A is always higher than the value of indicator B. In case of an error, we would also like the error report to contain information which allows for its simple identification in our data. Therefore, apart from collecting indicator values, it is also recommended to collect e.g. information about the system and the record ID.

rule creation

Figure 1. Selection of abstract attributes logged in the database under a given rule

Our sample A and B indicators can be located in different databases, in different tables and different columns - however, their method of calculation and construction are the same. During rule construction, we do not want to wonder what specific columns are called in specific databases, so we work with abstract attributes during rule creation. Only during the rule implementation stage (the next step) will a technician, such as an ETL developer or data analyst, select a given rule and map abstract attributes onto their physical equivalents in the table that they want to monitor.

rule creation

Figure 2. Rule creation using DataFlux Data Management Studio

During rule creation, depending on the number of people and the scale of the data monitoring undertaking, SAS offers two tools. The first tool, for smaller environments, is based on DataFlux SAS Data Management Studio. This application includes a component dedicated to data quality monitoring - SAS Business Rule Manager. Within the application, there are creators of rules based on abstract attributes as well as a launching environment in which we map rules onto physical data. During data processing, the results are recorded in the central data monitoring repository.

rule creation

Figure 3. Preview of the central rule repository in SAS Data Management Studio

When a larger number of people is responsible for rules, many different areas which are the responsibility of different business owners are coordinated. In such situations, it is recommended to think about a solution which will also be able to support processes related with rule review and acceptance. Additionally, it will be more easily accessible to a larger group of users. SAS Business Rule Manager - a website application - can be used in such environments. This application has a built-in workflow mechanism and versioning mechanisms. It does not require installation of any additional components on users’ workstations.

rule creation

Figure 4. Rule creation in SAS Business Rule Manager

After logging in, the user gains access to an interface in which they can modify and create rules, accept changes introduced by other users and even use a test set to verify whether everything works according to the assumptions. During rule creation, similarly to DataFlux, users work on abstract attributes. The process of mapping abstract attributes onto physical ones only takes place during the rule implementation stage. Users additionally have access to a workflow mechanism, which can be adapted to processes developed in a given organisation.  Thanks to this mechanism, users gain full control over the rules which are supposed to be implemented in the production environment.

rule creation

Figure 5. Creation of a set of validation rules in SAS Business Rule Manager

Additional options of SAS Business Data Network, which is integrated with the SAS Lineage component, are also worth mentioning. SAS Lineage is used for metadata integration, which concerns not only SAS tools metadata but also metadata from tools from other suppliers. In this application, we can track the whole information flow in the organisation — from a business concept to physical reporting and analytical processes, and even look up specific analyses or reports.

rule creation

Figure 6. Association of rules, tasks and owners in SAS Lineage environment

Such an information flow map not only makes work easier, but also allows for automation of a number of communication processes in the organisation. For example, if any system has deteriorating data quality indicators, we can quickly look up the business owner within the SAS Lineage data repository and automatically send them a message to promptly fix this issue.

Rule creation is a very important step in processes involving implementation of a data monitoring environment. Even if initially, the plan only assumes implementation of several indicators in a given area, it is highly probable that after successful implementation, the environment will start gradually expanding and covering new areas. When selecting applications supporting this process, it is recommended to first decide which functionalities are the most important to us and what we would want our environment to look like in several years’ time. Answering questions concerning the number of potential users, systems, rules and work organisation (e.g. whether everyone is in the same building) can prove helpful here. Once data monitoring business rules have been described and implemented, we can move on to subsequent steps, i.e. implementation and creation of own indicators and reports.


About Author

Łukasz Leszewski

Certified Business Intelligence Professional, Business Analytics. He graduated Faculty of Production Engineering at the Warsaw University of Technology. He has been working at SAS Institute for more than 12 years as an Architect, Project Manager and Consultant. During his work, he had the opportunity to work on many projects in many different sectors like telecommunications, insurance, retail, banking and public. He has extensive experience in the area of data integration and data quality.

Leave A Reply

Back to Top