Using a SAS function to validate a National Provider ID (NPI) value

We are careening towards the holiday season, and this year more than ever before it's going to mean one thing: Online Shopping. As you enter your credit card number over and over to complete your many purchases, you might occasionally transpose or mistype your account number. Does it surprise you when the online form immediately informs you that you've entered an invalid number? Before you even click the Complete Purchase button? How does it know the number is invalid? Is it checking against a list of known good numbers? For those concerned about online privacy, that's a disturbing thought.

No, of course these sites don't have a database of "good" credit card numbers that they use to validate your number. Instead, they simply use math. Credit card numbers follow a standard that allows their "syntax" to be validated by the Luhn algorithm, which combines a checksum approach followed by a "modulus 10" operation to validate the sequence of digits. A couple of years ago, Heuristic Andrew shared a SAS DATA step approach for validating credit card numbers.

National Provider IDs (NPIs), which are used to identify health care providers in the USA, use a similar approach. NPIs are 10 digits in length, and you can use the Luhn algorithm to distinguish a good provider ID from a bad one (where we're qualifying the ID, not the actual provider). The 10-digit ID value isn't quite enough information for the validation; in order for the math to work, you must prefix the ID with this constant string of digits: '80840'.

I can tell from reading SAS-L that there are lots of SAS professionals who work with healthcare data. Validating the NPI value can be a good first-line approach for detecting certain data entry errors.

Adapting Andrew's implementation of the Luhn algorithm, I created a SAS function that can tell you whether a given NPI is valid. I named it isValidNpi(npi_value), and it returns a 0 (invalid) or 1 (valid). It's simple to use:

/* my FCMP function library */
options cmplib=work.myfuncs;
data test;
  input npi $ 1-10;
  valid_npi = isValidNpi(npi);
  /* 4 valid NPIs followed by 5 invalid IDs */

Here's the result:

The remainder of this post is the complete FCMP source code for the function. I'm a bit of a newbie when it comes to understanding healthcare data, so I'd appreciate any feedback you might have about NPIs, validation, and this approach in general. Leave a note in the comments.

proc fcmp outlib=work.myfuncs.npi;
  /* Validates that a National Provider ID (NPI) value is valid  */
  /* Checks only that it is valid in terms of its form, not that */
  /* it corresponds to a real provider.                          */
  /* Uses the Luhn algorithm and a prefix of '80840'             */
  /* More:                                                       */
  /*               */
  /* */
  /* RETURNS: 1 if valid, 0 if not valid                         */
  /* hat tip: Heuristic Andrew */
  function isValidNpi(npi $);
    length AddIt $2 ChkStr $ 15 ChkSum 8;
    /* check length for NPI */
    if length(trim(npi)) ^= 10 then
        /* put 'Not 10 digits ' npi=; */
    /* check that all are digits */
    else if (prxmatch('/\D/',trim(npi)) > 0) then
        /* put 'invalid characters ' npi=; */
        /* Luhn's algorithm (also called modulus 10 or mod 10) */
        ChkSum = 0;
        do pos=1 to length(ChkStr);
          if mod(pos,2) then /* odd positions */
          else /* even positions: digit*2 */
          /* add digits */
          do i=1 to length(AddIt);
        /* Check if ID is valid or not (if ChkSum ends with Zero) */
        if mod(ChkSum,10)=0 then
            /* put 'This is a valid ID: ' npi= ChkSum=; */
            return (1);
        else if mod(ChkSum,10) ne 0 then
            /* put 'This is an invalid ID: ' npi= ChkSum=; */
            return (0);
options cmplib=work.myfuncs;
tags: healthcare, luhn algorithm, NPI, SAS programming


  1. marilyn
    Posted December 28, 2012 at 5:02 pm | Permalink

    Good idea! I plan on testing this out :)
    The more healthcare code, the better as far as I'm concerned :)

  2. Foster
    Posted April 9, 2013 at 3:59 pm | Permalink

    Chris, this is an excellent function; ideal for purpose. After validating the function against real NPI data, I read about 1.5 million records, and added a counter to the datastep to see how many good and how many bad I got, see below. The process took real time 1:30.46.

    data NPI_TEST (keep = good bad) ;
      set DBX.provider (keep = id_npi) end = last ; 
      if _n_ = 1 then do; 
      bad = 0 ; good = 0 ; end ; 
      retain bad good ; 
      valid_npi = isvalidnpi(id_npi);
      if valid_npi = 0 then bad + 1 ; 
      else if valid_npi = 1 then good + 1 ;
      if last ; 
    run ;
    • Chris Hemedinger Chris Hemedinger
      Posted April 9, 2013 at 4:21 pm | Permalink

      Foster, I'm glad that you found it useful! I suppose if you found some bad ones, that can be a bit telling about the data you've got to work with...

      • Foster
        Posted April 10, 2013 at 10:41 am | Permalink

        Exactly, so I will be integrating it into my data QA process. Furthermore, I can now match the errant records back to their originator and identify the culprits ;o)

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">