Matching Names: How to Avoid a Common Mistake


Sometimes when trying to fuzzy match names you want to fuzzy match just a portion of the name: for example, Family Name and/or Given Name.  A common mistake that people make is to feed in the Family Name and Given Name columns separately into the Match Codes node instead of the Match Codes (Parsed) node.

So why is this a mistake?  The Name Match definition is designed to accept a full name and then parse the information into its tokens – Name Prefix, Given Name, Middle Name, Family Name, Name Suffix, and Title/Additional Info.


Let’s look at a case where I want to have separate match codes for Given Name and Family Name and my data is the following:
Michael Smith
Mike Smith

The following are the match codes generated using the Match Codes node with the Name Match definition (English – United States) at a sensitivity of 85.


Match Code Node Results

Notice that the two records would NOT match using this approach since the Given_Name_MatchCodes are different.  The reason they do not match is if only one name is supplied when calling the Name Match definition, then in most cases when the Name is parsed it assumes you supplied only the Family Name and the nickname equivalents of Given Name are not applied to the input.

The following are the Match codes generated using the Match Codes (Parsed) node with the Name Match definition (English – United States) at a sensitivity of 85 by feeding in the Given Name and Family Name field in the appropriate tokens in separate calls to the node.

Match Code (Parsed) Node Results

Match Code (Parsed) Node Results

Notice that the names will match in this case since both the Given_Name_MatchCode and Family_Name_MatchCodes are the same. Using the Match Codes (Parsed) node ensures that the names are assigned to the proper tokens and the proper logic will be applied to each token.  Therefore, this is the approach you should use if trying to match Names based on just Given Name and/or Family Name.

Note:  You can also use the ParsedView of proc DQMATCH to generate match codes for parsed data. Refer to the SAS® 9.4: Data Quality Server Reference guide section on “DQMATCH – Example 4: Creating Match Codes for Parsed Values” for more information.

Now that you have this knowledge you can avoid making this common Name matching mistake on your data quality projects!


About Author

Mary Kathryn Queen

Principal Technical Training Consultant

Mary Kathryn Queen is a Principal Technical Training Consultant in the Global Enablement and Learning (GEL) Team within SAS R&D's Global Technical Enablement Division. Her primary focus is on SAS Data Management technologies, particularly data quality, data preparation, and data governance.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top