Identity, identification and the proliferation of identifiers


I was surprised to learn recently that despite the reams of laws and policies directing the protection of personally identifiable information (PII) across industries and government agencies, more than 50 million Medicare beneficiaries were issued cards with a Medicare Beneficiary Number that's based on their Social Security Number (SSN). That's right – the same SSN that's the key artifact required for identity theft is used as part of millions of existing Medicare Health Insurance Claim Numbers (HICNs). This poses a significant threat of PII exposure.

computer hacker collecting identifiers about an individualFortunately, one part of the 2015 Medicare Access and CHIP Reauthorization Act requires the Centers for Medicare and Medicaid Services (CMS) to remove SSNs from Medicare cards. This program, called the Social Security Number Removal Initiative (SSNRI) is intended to replace existing HICNs with a newly generated Medicare Beneficiary Identifier. Then Medicare cards will be reissued, eliminating exposure of individual SSNs.

Superficially, it seems straightforward: For each beneficiary, find their existing identification number, randomly generate a new identifier that's not SSN-based, and send out a new card. But there's a lot of complexity involved in properly executing this project. That's because the existing identifier is likely to have been used across many different processes, applications and data sets. Issuing the new card is just the tip of the iceberg. The real goal of the initiative is to find all the applications that use the existing HICN as an identifier, determine how the identifier is used, and modify the processes and application code to adapt to the new identifier.

The purposes of – and problems with – identifiers

Why do we use identifiers? Simply put, we often need a key value that's used:

  • To uniquely find the data about an entity.
  • And, consequently, to represent all known data about that entity.

For example, the the Social Security Administration (SSA) uses SSNs to uniquely index anyone contributing to or receiving benefits. Given an individual’s SSN, the SSA can retrieve all the information about that person's birth date and location, where they've worked and for how long, where they've lived, and their lifetime payroll payments, paid benefits and more. It becomes much more difficult without an identifier – especially for people with common names that make it hard to differentiate between them.

There are two big problems with identifiers:

  • Identifier proliferation. Since the applications most organizations use are usually not developed in a coordinated way, there's little preplanning for how to define unique identifiers that are used across the enterprise. In other words, we end up with the designers of each business process thinking they need their own identifiers. So we end up with many different identifiers.
  • Identifier overloading. The existence of an identifier encourages people to employ that identifier for multiple purposes. The prime example is the SSN. SSNs have been co-opted for various other government systems (it could be the basis of a state driver’s license number, for example). SSNs have also been used by commercial businesses (such as telephone companies and health insurance companies). As this happens, identifiers eventually begin to take on unintended meanings – consider how the last four digits of a social security number are often used for authentication.

And here is the challenge: The concept of a single identifier is to provide a unique index to an entity’s set of records – but the proliferation of identifiers diminishes their value for isolation.

In my next post, we’ll look at some more insidious issues with the creation of multiple identifiers.

Read how a data governance strategy supports security, privacy and trust


About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at

Related Posts

Leave A Reply

Back to Top