Using a SAS function to validate a National Provider ID (NPI) value

We are careening towards the holiday season, and this year more than ever before it's going to mean one thing: Online Shopping. As you enter your credit card number over and over to complete your many purchases, you might occasionally transpose or mistype your account number. Does it surprise you when the online form immediately informs you that you've entered an invalid number? Before you even click the Complete Purchase button? How does it know the number is invalid? Is it checking against a list of known good numbers? For those concerned about online privacy, that's a disturbing thought.

No, of course these sites don't have a database of "good" credit card numbers that they use to validate your number. Instead, they simply use math. Credit card numbers follow a standard that allows their "syntax" to be validated by the Luhn algorithm, which combines a checksum approach followed by a "modulus 10" operation to validate the sequence of digits. A couple of years ago, Heuristic Andrew shared a SAS DATA step approach for validating credit card numbers.

National Provider IDs (NPIs), which are used to identify health care providers in the USA, use a similar approach. NPIs are 10 digits in length, and you can use the Luhn algorithm to distinguish a good provider ID from a bad one (where we're qualifying the ID, not the actual provider). The 10-digit ID value isn't quite enough information for the validation; in order for the math to work, you must prefix the ID with this constant string of digits: '80840'.

I can tell from reading SAS-L that there are lots of SAS professionals who work with healthcare data. Validating the NPI value can be a good first-line approach for detecting certain data entry errors.

Adapting Andrew's implementation of the Luhn algorithm, I created a SAS function that can tell you whether a given NPI is valid. I named it isValidNpi(npi_value), and it returns a 0 (invalid) or 1 (valid). It's simple to use:

/* my FCMP function library */
options cmplib=work.myfuncs;
 
data test;
  input npi $ 1-10;
  valid_npi = isValidNpi(npi);
  /* 4 valid NPIs followed by 5 invalid IDs */
  datalines;
1003802901
1003864232
1013955343
1134108814
0000000000
1234567890
15280060
152
10038029O1
run;

Here's the result:

The remainder of this post is the complete FCMP source code for the function. I'm a bit of a newbie when it comes to understanding healthcare data, so I'd appreciate any feedback you might have about NPIs, validation, and this approach in general. Leave a note in the comments.

proc fcmp outlib=work.myfuncs.npi;
  /***************************************************************/
  /* Validates that a National Provider ID (NPI) value is valid  */
  /* Checks only that it is valid in terms of its form, not that */
  /* it corresponds to a real provider.                          */
  /* Uses the Luhn algorithm and a prefix of '80840'             */
  /* More:                                                       */
  /*   http://en.wikipedia.org/wiki/Luhn_algorithm               */
  /*   http://en.wikipedia.org/wiki/National_Provider_Identifier */
  /* RETURNS: 1 if valid, 0 if not valid                         */
  /***************************************************************/
  /* hat tip: Heuristic Andrew */
  function isValidNpi(npi $);
    length AddIt $2 ChkStr $ 15 ChkSum 8;
    /* check length for NPI */
    if length(trim(npi)) ^= 10 then
      do;
        /* put 'Not 10 digits ' npi=; */
        return(0);
      end;
    /* check that all are digits */
    else if (prxmatch('/\D/',trim(npi)) > 0) then
      do;
        /* put 'invalid characters ' npi=; */
        return(0);
      end;
    else
      do;
        /* Luhn's algorithm (also called modulus 10 or mod 10) */
        ChkSum = 0;
        ChkStr=reverse(cats('80840',npi));
        do pos=1 to length(ChkStr);
          if mod(pos,2) then /* odd positions */
            do;
              AddIt=substr(ChkStr,pos,1);
            end;
          else /* even positions: digit*2 */
          do;
            AddIt=put(2*input(substr(ChkStr,pos,1),2.),2.);
          end;
          /* add digits */
          do i=1 to length(AddIt);
            ChkSum+input(substr(AddIt,i,1),2.);
          end;
        end;
        /* Check if ID is valid or not (if ChkSum ends with Zero) */
        if mod(ChkSum,10)=0 then
          do;
            /* put 'This is a valid ID: ' npi= ChkSum=; */
            return (1);
          end;
        else if mod(ChkSum,10) ne 0 then
          do;
            /* put 'This is an invalid ID: ' npi= ChkSum=; */
            return (0);
          end;
      end;
  endsub;
run;
quit;
options cmplib=work.myfuncs;

data NPI_TEST (keep = good bad) ; set DBX.provider (keep = id_npi) end = last ; if _n_ = 1 then do; bad = 0 ; good = 0 ; end ; retain bad good ; valid_npi = isvalidnpi(id_npi); if valid_npi = 0 then bad + 1 ; else if valid_npi = 1 then good + 1 ; if last ; run ;

16 Comments

marilyn on December 28, 2012 5:02 pm

Good idea! I plan on testing this out :)
The more healthcare code, the better as far as I'm concerned :)

Foster on April 9, 2013 3:59 pm

Chris, this is an excellent function; ideal for purpose. After validating the function against real NPI data, I read about 1.5 million records, and added a counter to the datastep to see how many good and how many bad I got, see below. The process took real time 1:30.46.

Chris Hemedinger on April 9, 2013 4:21 pm

Foster, I'm glad that you found it useful! I suppose if you found some bad ones, that can be a bit telling about the data you've got to work with...
- Foster on April 10, 2013 10:41 am
  
  Exactly, so I will be integrating it into my data QA process. Furthermore, I can now match the errant records back to their originator and identify the culprits ;o)

William Viergever on December 16, 2015 4:20 pm

way cool! - nice Christmas present; thanks
am literally in the midst of processing Medicaid paid claims for some 80 California hospitals right now -- request was made in July and i'm just getting the damn data-- and part of the delay was the State checking what all the hospitals requested ... wil have to share this w/ one of their programmers
happy hump day!
C Paxton on March 21, 2016 6:13 pm

Many thanks for this. Ended up modifying for speed. Have to process several million NPI's regularly, and the prxmatch and looping were taxing. E.g. took an additional 16 seconds to process 120k records with the above; with what follows, that is cut down to 1.6 seconds. (this may not seem like much, but it dropped the ds I was processing from ~ 60 minutes back to 6 minutes.
(this is not elegant, and not documented, but it is fast)
function isValidNpi(_npi $); length _chknpi $ 15 _npisum 8; if length(trim(_npi)) ^= 10 or (anyalpha(_npi)) > 0 or subpad(_npi,1,1) not in ('1','2') then return(0); /* check length for NPI - must be 10 and that all are digits w 1st in (1,2)*/ else do; /* Luhn's algorithm (also called modulus 10 or mod 10) */ _npisum=24; _chknpi=compress(subpad(_npi,1,1)*2||subpad(_npi,2,1)||subpad(_npi,3,1)*2||subpad(_npi,4,1)||subpad(_npi,5,1)*2||subpad(_npi,6,1)||subpad(_npi,7,1)*2||subpad(_npi,8,1)||subpad(_npi,9,1)*2); do _npipp=1 to length(_chknpi); _npisum=sum(_npisum,subpad(_chknpi,_npipp,1)); end; if strip(subpad(_npi,10,1))=strip(sum(ceil(_npisum/10)*10,-1*_npisum)) then return(1); else return(0); end; endsub;
- Chris Hemedinger on March 22, 2016 8:03 am
  
  Glad it helped you -- and thanks for sharing your revision! BUT, I ran your version and found some different results. In my small test data, my version reports records 3 and 4 as valid NPIs (1013955343 and
  1134108814), but your version of the function reports them as invalid.
C Paxton on March 22, 2016 7:35 pm

hmm. Just ran those npi's and w/ the following results (from the log):
4003 data _null_; length npi $ 10; 4004 npi='1003802901' ; valid_npi=isvalidnpi(npi); put npi valid_npi ; 4005 npi='1003864232' ; valid_npi=isvalidnpi(npi); put npi valid_npi ; 4006 npi='1013955343' ; valid_npi=isvalidnpi(npi); put npi valid_npi ; 4007 npi='1134108814' ; valid_npi=isvalidnpi(npi); put npi valid_npi ; 4008 npi='0000000000' ; valid_npi=isvalidnpi(npi); put npi valid_npi ; 4009 npi='1234567890' ; valid_npi=isvalidnpi(npi); put npi valid_npi ; 4010 npi='15280060' ; valid_npi=isvalidnpi(npi); put npi valid_npi ; 4011 npi='152' ; valid_npi=isvalidnpi(npi); put npi valid_npi ; 4012 npi='10038029O1' ; valid_npi=isvalidnpi(npi); put npi valid_npi ; 4013 run; 1003802901 1 1003864232 1 1013955343 1 1134108814 1 0000000000 0 1234567890 0 15280060 0 152 0 10038029O1 0 NOTE: DATA statement used (Total process time): real time 0.04 seconds cpu time 0.04 seconds
You sure you compiled the function as written?
- Chris Hemedinger on March 23, 2016 9:15 am
  
  I ran my test in 9.4m3 and got the wrong results with your function. But running it in 9.3, the results from your function were correct. I'll need to apply latest 9.4m3 hotfixes to see if there is a difference before pursuing it with R&D.
  - C Paxton on March 23, 2016 2:56 pm
    
    running 9.4M2. Different results are weird.
    
    One important thing to add to your code, though, is that the first character must be a 1 or 2 (see ), requirement #5.
    
    If you run '3003802907' through your code it comes back as valid. Should be invalid
    - Chris Hemedinger on March 23, 2016 2:57 pm
      
      Good addition, thanks!
C Paxton on March 23, 2016 4:26 pm

No problem. As someone who's been burned by Wikipedia errors before, I ALWAYS double check what they document. It's crowd sourced, but there's one in every crowd...
Seb on May 31, 2017 9:23 am

Thanks for your help for the Luhn algorithm, helped me a lot!
I have to check social security numbers, that contain 2 check digits, one LUHN10 and one VERHOEFF. I haven't found a solution for the Verhoeff checkdigit in SAS yet.The number to check is YYYY
MMDDXXXLV with L the luhn checkdigit based on the first 11 digits and V the verhoeff checkdigit based also on the first 11 digits.

Could anyone help please?

Thanks
- Chris Hemedinger on May 31, 2017 10:00 am
  
  Seb, you might pose this puzzle to one of the discussion forums in the SAS Support Communities. I'd bet that some SAS programmers there would happily take up the challenge, especially if you can provide some examples of valid and invalid numbers.
Francisco Harris on March 15, 2021 2:38 pm

This SAS code was very helpful, but without resetting ChkSum, SAS would fail all credit card numbers following an invalid card number.
- Chris Hemedinger on March 15, 2021 2:54 pm
  
  My code does set ChkSum=0 at the start of the loop. Are you saying you found a problem? If so, let me know and I'll correct it.

Blogs

Blogs

Using a SAS function to validate a National Provider ID (NPI) value

About Author

16 Comments

Follow Us

What is...