Improving data quality through SAS Data Remediation

0

With SAS Data Management, you can setup SAS Data Remediation to manage and correct data issues. SAS Data Remediation allows user- or role-based access to data exceptions.

When a data issue is discovered it can be sent automatically or manually to a remediation queue where it can be corrected by designated users.

Let’s look how to setup a remediation service and how to send issue records to Data Remediation.

Register the remediation service.

To register a remediation service in SAS Data Remediation we go to Data Remediation Administrator “Add New Client Application.

Under Properties we supply an ID, which can be the name of the remediation service as long as it is unique, and a Display name, which is the name showing in the Remediation UI.

Under the tab Subject Area, we can register different subject categories for this remediation service.  When calling the remediation service we can categorize different remediation issues by setting different subject areas. We can, for example, use the Subject Area to point to different Data Quality Dimensions like Completeness, Uniqueness, Validity, Accuracy, Consistency.

Under the tab Issues Types, we can register issue categories. This enables us to categorize the different remediation issues. For example, we can point to the affected part of record like Name, Address, Phone Number.

At Task Templates/Select Templates we can set a workflow to be used for each issue type. You can design your own workflow using SAS Workflow Studio or you can use a prepared workflow that comes with Data Remediation. You need to make sure that the desired workflow is loaded on to Workflow Server to link it to the Data Remediation Service. Workflows are not mandatory in SAS Data Remediation but will improve efficiency of the remediation process.

Saving the remediation service will make it available to be called.

Sending issues to Data Remediation.

When you process data, and have identified issues that you want to send to Data Remediation, you can either call Data Remediation from the job immediately where you process the data or you store the issue records in a table first and then, in a second step, create remediation records via a Data Management job.

To send records to Data Remediation you can call remediation REST API form the HTTP Request node in a Data Management job.

Remediation REST API

The REST API expects a JSON structure supplying all required information:

{
	"application": "mandatory",
	"subjectArea": "mandatory",
	"name": "mandatory",
	"description": "",
	"userDefinedFieldLabels": {
		"1": "",
		"2": "",
		"3": ""
	},
	"topics": [{
		"url": "",
		"name": "",
		"userDefinedFields": {
			"1": "",
			"2": "",
			"3": ""
		},
		"key": "",
		"issues": [{
			"name": "mandatory",
			"importance": "",
			"note": "",
			"assignee": {
				"name": ""
			},
			"workflowName": "",
			"dueDate": "",
			"status": ""
		}]
	}]
}

 

JSON structure description:

In a Data Management job, you can create the JSON structure in an Expression node and use field substitution to pass in the necessary values from the issue records. The expression code could look like this:

REM_APPLICATION= "Customer Record"
REM_SUBJECT_AREA= "Completeness"
REM_PACKAGE_NAME= "Data Correction"
REM_PACKAGE_DESCRIPTION= "Mon-Result: " &formatdate(today(),"DD MM YY") 
REM_URL= "http://myserver/Sourcesys/#ID=" &record_id
REM_ITEM_NAME= "Mobile phone number missing"
REM_FIELDLABEL_1= "Source System"
REM_FIELD_1= "CRM"
REM_FIELDLABEL_2= "Redord ID"
REM_FIELD_2= record_id
REM_FIELDLABEL_3= "-"
REM_FIELD_3= ""
REM_KEY= record_id
REM_ISSUE_NAME= "Phone Number"
REM_IMPORTANCE= "high"
REM_ISSUE_NOTE= "Violated data quality rule phone: 4711"
REM_ASSIGNEE= "Ben"
REM_WORKFLOW= "Customer Tag"
REM_DUE-DATE= "2018-11-01"
REM_STATUS= "open"
 
JSON_REQUEST= '
{
  "application":"' &REM_APPLICATION &'",
  "subjectArea":"' &REM_SUBJECT_AREA &'",
  "name":"' &REM_PACKAGE_NAME &'",
  "description":"' &REM_PACKAGE_DESCRIPTION &'",
  "userDefinedFieldLabels": {
    "1":"' &REM_FIELDLABEL_1 &'",
    "2":"' &REM_FIELDLABEL_2 &'",
    "3":"' &REM_FIELDLABEL_3 &'"
  },
  "topics": [{
    "url":"' &REM_URL &'",
    "name":"' &REM_ITEM_NAME &'",
    "userDefinedFields": {
      "1":"' &REM_FIELD_1 &'",
      "2":"' &REM_FIELD_2 &'",
      "3":"' &REM_FIELD_3 &'"
    },
    "key":"' &REM_KEY &'",
    "issues": [{
      "name":"' &REM_ISSUE_NAME &'",
      "importance":"' &REM_IMPORTANCE &'",
      "note":"' &REM_ISSUE_NOTE &'",
      "assignee": {
        "name":"' &REM_ASSIGNEE &'"
      },
      "workflowName":"' &REM_WORKFLOW &'",
      "dueDate":"' &REM_DUE_DATE &'",
      "status":"' &REM_STATUS &'"
    }]
  }]
}'

 

Tip: You could also write a global function to generate the JSON structure.

After creating the JSON structure, you can invoke the web service to create remediation records. In the HTTP Request node, you call the web service as follows:

Address:  http://[server]:[port]/SASDataRemediation/rest/groups
Method: post
Input Filed: The variable containing the JSON structure. I.e. JSON_REQUEST
Output Filed: A field to take the output from the web service. You can use the New button create a filed and set the size to 1000
Under Security… you can set a defined user and password to access Data Remediation.
In the HTTP Request node’s advanced settings set the WSCP_HTTP_CONTENT_TYPE options to application/json

 

 

 

You can now execute the Data Management job to create the remediation records in SAS Data Remediation.

Share

About Author

Clemens Knobloch

Principal Business Solution Manager

Clemens is a domain expert in the field of Data Management with a key focus on Master Data Management. Clemens supports customers in the Finance, Retail and Manufacturing sector, meeting with customers at senior level discussing the benefits of Data Quality, Data Governance and MDM and works closely with SAS R&D to ensure customer and market direction input is reflected in on-going software releases.

Related Posts

Leave A Reply

Back to Top