In DataFlux Data Management Studio, the data quality nodes (e.g., Parsing, Standardization, and Match Codes) in a data job use definitions from the SAS Quality Knowledge Base (QKB). These definitions are based on a locale (Language and Country combination). Sometimes you would like to work with multi-locale data within the same data job and these data quality nodes have LOCALE attributes as part of their Advanced Properties to help you do this.
For example, you may want to work with data from the United States, Canada, and the United Kingdom within the same data job. Note: You must have the QKB locale data installed and be licensed for any locales that you plan to use in your data job.
The Advanced properties you will need to use are LOCALE_FIELD and LOCALE_LIST. LOCALE_FIELD specifies the column name that contains the 5-character locale value to use for each record. LOCALE_LIST specifies the list of locales that should be loaded into memory for use within the node.
The first step in using these Advanced properties in a data quality node in a data job is you need a field that contains the 5-character QKB locale information. The first 2-characters represent the language and the last 3-characters represent the country. For example, ENUSA represents the English – United States locale and ESESP represents the Spanish – Spain locale. You can use the Locale Guessing node to create a field that contains a locale value based on a Locale Guess definition from the Quality Knowledge Base (QKB). Alternatively, you can use a custom Standardization scheme to assign the 5-character locale information as shown in the example below.
QKB Locale Standardization Scheme
Apply QKB Locale Standardization Scheme
The application of the Standardization scheme is then followed up with an Expression node to assign the locale FRCAN (French – Canada) if the province is Quebec. Now each record has its 5-character locale information in a field called DF_Locale.
QKB Locale Field Results
Once you have the Locale field as part of your input data, you enter the information as usual for the data quality node.
Match Codes Node Properties
Then you map the field with the 5-character locale information to the LOCALE_FIELD Advanced property for the data quality node. You also need to list the locales that should be loaded into memory in the LOCALE_LIST advanced property. Note: You could pass in this list as values using a macro variable.
Match Codes Node Advanced Properties
Note: The definition used in the selected node must exist in all locales referenced. For example, the State/Province Match definition only exists in the English – United States, English – Canada, and French – Canada locales. Therefore, if you are using that definition in a Match Codes node you can only pass in data that is from one of those three locales; otherwise, executing the data job will produce an error.
Here is an example data job that uses the Advanced properties of LOCALE_FIELD and LOCALE_LIST to generate match codes for multi-locale data. Notice that there is minimal branching in the data flow. The only branching that was needed is the United States and Canada data records are branched to generate the match codes for its State/Province data.
Multi-Locale Data Job Example
In conclusion, the Advanced properties of LOCALE_FIELD and LOCALE_LIST are useful when you want to work with data from multiple locales within the same data job. For more information on Advanced properties for nodes in DataFlux Data Management Studio, refer to the topic “Advanced Properties” in the DataFlux Data Management Studio 2.7: User’s Guide.
1 Comment
This is quite nice and helpful, as I can think of use cases where you may need to refine matches for SCV with codes generated using more than one locale. Thanks!