Using Multiple Quality Knowledge Base Locales in a DataFlux Data Management Studio Data Job

1

In DataFlux Data Management Studio, the data quality nodes (e.g., Parsing, Standardization, and Match Codes) in a data job use definitions from the SAS Quality Knowledge Base (QKB).  These definitions are based on a locale (Language and Country combination).  Sometimes you would like to work with multi-locale data within the same data job and these data quality nodes have LOCALE attributes as part of their Advanced Properties to help you do this.

For example, you may want to work with data from the United States, Canada, and the United Kingdom within the same data job.  Note:  You must have the QKB locale data installed and be licensed for any locales that you plan to use in your data job.

The Advanced properties you will need to use are LOCALE_FIELD and LOCALE_LIST.  LOCALE_FIELD specifies the column name that contains the 5-character locale value to use for each record.  LOCALE_LIST specifies the list of locales that should be loaded into memory for use within the node.

The first step in using these Advanced properties in a data quality node in a data job is you need a field that contains the 5-character QKB locale information.  The first 2-characters represent the language and the last 3-characters represent the country.  For example, ENUSA represents the English – United States locale and ESESP represents the Spanish – Spain locale.  You can use the Locale Guessing node to create a field that contains a locale value based on a Locale Guess definition from the Quality Knowledge Base (QKB).  Alternatively, you can use a custom Standardization scheme to assign the 5-character locale information as shown in the example below.

 

Using Multiple Quality Knowledge Base Locales01
QKB Locale Standardization Scheme

 

Using Multiple Quality Knowledge Base Locales02
Apply QKB Locale Standardization Scheme

The application of the Standardization scheme is then followed up with an Expression node to assign the locale FRCAN (French – Canada) if the province is Quebec.  Now each record has its 5-character locale information in a field called DF_Locale.

Using Multiple Quality Knowledge Base Locales03
QKB Locale Field Results

Once you have the Locale field as part of your input data, you enter the information as usual for the data quality node.

Using Multiple Quality Knowledge Base Locales04
Match Codes Node Properties

Then you map the field with the 5-character locale information to the LOCALE_FIELD Advanced property for the data quality node. You also need to list the locales that should be loaded into memory in the LOCALE_LIST advanced property. Note: You could pass in this list as values using a macro variable.

Using Multiple Quality Knowledge Base Locales05
Match Codes Node Advanced Properties

Note: The definition used in the selected node must exist in all locales referenced.  For example, the State/Province Match definition only exists in the English – United States, English – Canada, and French – Canada locales.  Therefore, if you are using that definition in a Match Codes node you can only pass in data that is from one of those three locales; otherwise, executing the data job will produce an error.

Here is an example data job that uses the Advanced properties of LOCALE_FIELD and LOCALE_LIST to generate match codes for multi-locale data. Notice that there is minimal branching in the data flow.  The only branching that was needed is the United States and Canada data records are branched to generate the match codes for its State/Province data.

Using Multiple Quality Knowledge Base Locales06
Multi-Locale Data Job Example

In conclusion, the Advanced properties of LOCALE_FIELD and LOCALE_LIST are useful when you want to work with data from multiple locales within the same data job. For more information on Advanced properties for nodes in DataFlux Data Management Studio, refer to the topic “Advanced Properties” in the DataFlux Data Management Studio 2.7: User’s Guide.

Share

About Author

Mary Kathryn Queen

Principal Technical Training Consultant

Mary Kathryn Queen is a Principal Technical Training Consultant in the Global Enablement and Learning (GEL) Team within SAS R&D's Global Technical Enablement Division. Her primary focus is on SAS Data Management technologies, particularly data quality, data preparation, and data governance.

1 Comment

  1. Paulo Abalos on

    This is quite nice and helpful, as I can think of use cases where you may need to refine matches for SCV with codes generated using more than one locale. Thanks!

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top