Identity, identification and the proliferation of identifiers

I was surprised to learn recently that despite the reams of laws and policies directing the protection of personally identifiable information (PII) across industries and government agencies, more than 50 million Medicare beneficiaries were issued cards with a Medicare Beneficiary Number that's based on their Social Security Number (SSN). That's right – the same SSN that's the key artifact required for identity theft is used as part of millions of existing Medicare Health Insurance Claim Numbers (HICNs). This poses a significant threat of PII exposure.

computer hacker collecting identifiers about an individual Fortunately, one part of the 2015 Medicare Access and CHIP Reauthorization Act requires the Centers for Medicare and Medicaid Services (CMS) to remove SSNs from Medicare cards. This program, called the Social Security Number Removal Initiative (SSNRI) is intended to replace existing HICNs with a newly generated Medicare Beneficiary Identifier. Then Medicare cards will be reissued, eliminating exposure of individual SSNs.

Superficially, it seems straightforward: For each beneficiary, find their existing identification number, randomly generate a new identifier that's not SSN-based, and send out a new card. But there's a lot of complexity involved in properly executing this project. That's because the existing identifier is likely to have been used across many different processes, applications and data sets. Issuing the new card is just the tip of the iceberg. The real goal of the initiative is to find all the applications that use the existing HICN as an identifier, determine how the identifier is used, and modify the processes and application code to adapt to the new identifier.

The purposes of – and problems with – identifiers

Why do we use identifiers? Simply put, we often need a key value that's used:

To uniquely find the data about an entity.
And, consequently, to represent all known data about that entity.

For example, the the Social Security Administration (SSA) uses SSNs to uniquely index anyone contributing to or receiving benefits. Given an individual’s SSN, the SSA can retrieve all the information about that person's birth date and location, where they've worked and for how long, where they've lived, and their lifetime payroll payments, paid benefits and more. It becomes much more difficult without an identifier – especially for people with common names that make it hard to differentiate between them.

There are two big problems with identifiers:

Identifier proliferation. Since the applications most organizations use are usually not developed in a coordinated way, there's little preplanning for how to define unique identifiers that are used across the enterprise. In other words, we end up with the designers of each business process thinking they need their own identifiers. So we end up with many different identifiers.
Identifier overloading. The existence of an identifier encourages people to employ that identifier for multiple purposes. The prime example is the SSN. SSNs have been co-opted for various other government systems (it could be the basis of a state driver’s license number, for example). SSNs have also been used by commercial businesses (such as telephone companies and health insurance companies). As this happens, identifiers eventually begin to take on unintended meanings – consider how the last four digits of a social security number are often used for authentication.

And here is the challenge: The concept of a single identifier is to provide a unique index to an entity’s set of records – but the proliferation of identifiers diminishes their value for isolation.

In my next post, we’ll look at some more insidious issues with the creation of multiple identifiers.

Read how a data governance strategy supports security, privacy and trust