Writing your own functions in SAS Data Quality using Dataflux Data Management Studio

1

IT organizations today are constantly challenged to do more with less. Reusing data processing jobs and employing best practices in monitoring the health of your data are proven ways to improve the productivity of data professionals. Dataflux Data Management Studio is a component of both the SAS Data Quality and the SAS Data Management offerings that allows you to create data processing jobs to integrate, cleanse and monitor your data.

You can write global functions for SAS Data Management jobs that can be reused in any expression in the system, in either data or process flow jobs. Global functions can be called from expression nodes, monitor rules, profile filters, surviving record indicators, process flow if-nodes and more.

Global functions are defined in a text file and saved in the Data Management installation directory under “etc/udf” in Data Management Studio or Data Management Server respectively.

Each global function has to have a unique name and is wrapped with a function / end function block code, can process any number of input parameters and returns a single value of either integer, real, date, string or boolean type.

Hello World

For a start, let’s create a “hello world” function.

  • If it does not exist, create a folder in the installation directory under “etc/udf” (DM Studio and DM Server).
  • In “etc/udf” create a file named hello_world.txt.
  • In the hello_word file create the function as follows:
function hello_world return string
return “hello world!”
end function
  • Save the file and restart DM Studio, if necessary, in order to use hello_world().

The new function is fully integrated in Data Management Studio. You can see the new function in an expression node under Function->Other or as expression language help in the expression node editor.

Handling Parameters

Global functions can handle any number of parameters. Parameter helper functions are available to access input parameters inside a function:

  • paramatercount() returns the number parameters that have been passed into the function call.  This is helpful if the incoming parameters are unknown.
integer i
for i = 1 to parametercount() 
begin
   // process each parameter
end

 

  • parametertype(integer) returns the type of the parameter for the given parameter position. The first parameter is 1. The return value will either be integer, real, date, string or Boolean.
  • parameterstring(integer), parameterinteger(integer), parameterboolean(integer), parameterdate(integer), parameterreal(integer) these functions return the value of the parameter as specified by position, or null if the parameter doesn’t exist. You can use these functions if you know the incoming parameter type at a given position.
  • parameter(integer) returns the value of the parameter as specified by position, or null if the parameter doesn’t exist. If you don’t know the incoming parameter type you can use this function. Note: Using the parameter() function may require additional overhead to coerce the values to the field type. Using the specific data type parameter functions above will eliminate the cost of coercion.

Global Function Example

This global function will check if password rules are followed.

//////////////////////////////////////////////////////////////////////////
// Function:     check_psw_rule
// Inputs:       string
// Output:       boolean -> true == passed check; false == failed check
// Description:  Check the rules for password. The rules are:
//               Need to be at least 8 characters long
//               Need to have at least one lower case character
//               Need to have at least one upper case character
//               Need to have at least one number
//////////////////////////////////////////////////////////////////////////
function check_psw_rule return boolean
	string check_str
	boolean rc
	regex r
 
	check_str= parameterstring(1)   //copy input parameter to variable
 
	rc= false                       //set default return value to failed (false)
	if(len(check_str) < 8)          //check if at least 8 characters
		return rc
	r.compile("[a-z]")              
	if (!r.findfirst(check_str))    //check if at least one lower case character
		return rc
	r.compile("[A-Z]")
	if (!r.findfirst(check_str))    //check if at least one upper case character
		return rc
	r.compile("[0-9]")
	if (!r.findfirst(check_str))    //check if at least one number
		return rc
	rc= true                        //return true if all checks passed
	return rc
end function

 

This function can be called from any expression in a Data Management job:

boolean  check_result
check_result= check_psw_rule(password)

Global function can also call other global function

Just a few things to be aware of. There is a late binding process, which means if function B() wants to call function A(), then function A() needs to be loaded first. The files global functions are stored in are loaded alphabetically by file name. This means the file name containing function A() has to occurs alphabetically before file name containing function B().

Best Practices

Here are some best practice tips which will help you to be most successful writing global functions:

  1. Create one file per expression function.
    This allows for global functions to easily be deployed and shared.
  2. Use lots of comments.
    Describe what the function’s purpose, expected parameters, and outputs and improve the readability and reusability of your code
  3. Test the expressions in data jobs first.
    Write a global function body as an expression first and test it via preview. This way it is easier to find typos, syntax errors and to ensure that the code is doing what you would like it to do.
  4. Debugging - If the global function is not loading, check the platform_date.log.  For Studio, this could for example be found under: C:\Users\<your_id>\AppData\Roaming\DataFlux\DMStudio\studio1

You now have a taste of how to create reusable functions in Data Management Studio to help you both improve the quality of your data as well as improve the productivity of your data professionals. Good luck and please let us know what kind of jobs you are using to help your organization succeed.

Share

About Author

Clemens Knobloch

Principal Business Solution Manager

Clemens is a domain expert in the field of Data Management with a key focus on Master Data Management. Clemens supports customers in the Finance, Retail and Manufacturing sector, meeting with customers at senior level discussing the benefits of Data Quality, Data Governance and MDM and works closely with SAS R&D to ensure customer and market direction input is reflected in on-going software releases.

1 Comment

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top