Convert a text-based measurement to a number in SAS

2

Since we added the new "Recommended by SAS" widget in the SAS Support Communities, I often find myself diverted to topics that I probably would not have found otherwise. This is how I landed on this question and solution from 2011 -- "How to convert 5ft. 4in. (Char) into inches (Num)". While the question was deftly answered by my friend (and SAS partner) Jan K. from the Netherlands, the topic inspired me to take it a step further here.

Jan began his response by throwing a little shade on the USA:

Short of moving to a country that has a decent metric system in place, I suggest using a regular expression.

On behalf of my nation I just want say that, for the record, we tried. But we did not get very far with our metrication, so we are stuck with the imperial system for most non-scientific endeavors.

Matching patterns with a regular expression

Regular expressions are a powerful method for finding specific patterns in text. The syntax of regular expressions can be a challenge for those getting started, but once you've solved a few pattern-recognition problems by using them, you'll never go back to your old methods.

Beginning with the solution offered by Jan, I extended this program to read in a "ft. in." measurement, convert to the component number values, express the total value in inches, and then convert the measurement to centimeters. I know that even with my changes, we can think of patterns that might not be matched. But allow me to describe the updates:

  • Long-time users of PRXPARSE in SAS will notice that I did not apply the common practice of wrapping PRXPARSE in an "IF _N_=1" condition. For several releases of SAS, it's been the case that when using a constant parse pattern the DATA step will compile the regular expression just once. Parsing the expression hundreds or thousands of times across each DATA step iteration would be a performance concern -- but since in our case the pattern never changes, SAS compiles it just once.
  • The PRXPOSN function returns the nth "capture buffer" from the pattern match. Capture buffers are identified in the pattern by parentheses -- you count each open parenthesis to arrive at the expected buffer. So the first buffer matches on the first sequence of digits: (\d*). The second buffer is the optional whitespace between ft. and in.: (\s*). Buffer 3 is the entire pattern for "n in.": ((\d?\.?\d?)in.). Finally, buffer 4 is an "inner" capture group of buffer 3, containing just the sequence of digits with optional decimal for inches: (\d?\.?\d?)
  • The PRXPOSN function returns the text value of the match, so we have to use the INPUT function to convert that to a SAS numeric value.
  • Finally, I added the calculations to convert to total inches, and then centimeters.

Here's my program, followed by the result:

data measure;
 length 
     original $ 25
     feet 8 inches 8 
     total_inches 8 total_cm 8;
 /* constant regex is parsed just once */
 re = prxparse('/(\d*)ft.(\s*)((\d?\.?\d?)in.)?/'); 
 input;
 original = _infile_;
 if prxmatch(re, original) then do;
  feet =   input ( prxposn(re, 1, original), best12.);
  inches = input ( prxposn(re, 4, original), best12.);
  if missing(inches) and not missing(feet) then inches=0;
 end;
 else 
   original = "[NO MATCH] " || original;
 total_inches = (feet*12) + inches;
 total_cm = total_inches * 2.54;
 drop re;
cards;
5ft. 4in.
4ft 0in.
6ft. 10in.
3ft.2in.
4ft.
6ft.     1.5in.
20ft. 11in.
25ft. 6.5in.
Soooooo big
;

Other tools to help with regular expressions

The Internet offers a plethora of online tools to help developers build and test regular expression syntax. Here's a screenshot from RegExr.com, which I used to test some aspects of my program.

Tools like these provide wonderful insight into the capture groups and regex directives that will influence the pattern matching. They are part tutorial, part workbench for regex crafters at all levels.

Many of these tools also include community contributions for matching common patterns. For example, many of us often need to match/parse data with e-mail addresses, phone numbers, and other tokens as data fields. Sites like RegExr.com include the syntax for many of these common patterns that you can copy and use in your SAS programs.

See Also

Share

About Author

Chris Hemedinger

Senior Manager, SAS Online Communities

+Chris Hemedinger is the manager of SAS Online Communities. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies.  He also hosts the SAS Tech Talk webcasts each year from SAS Global Forum, connecting viewers with smart people from SAS R&D and the impressive work that they do.

Related Posts

2 Comments

  1. Jan Klaverstijn on

    Thanks for the mention Chris! Good article. And I learned a new thing: you no longer need the "if _n_=1" construct any longer which will simplify my code from now on. It also shows that, to be a good coder, you need to be a life-long learner. Otherwise your knowledge will go stale without you realizing it. I am confronted with other peoples code on a daily basis and there is no shortage of examples where regular expressions would eliminate complex and lengthy sequences of other text functions. Spread the word!

    • Chris Hemedinger
      Chris Hemedinger on

      I agree -- once you begin seeing solutions in "regex" -- it's difficult to stop seeing the possibilities. Pro tip: always pepper your regex code with comments. Deciphering even a moderately complex pattern later is challenging on its own.

Leave A Reply

Back to Top