Last time I checked, there are well over 500 functions and call routines in SAS. I’ve taught SAS programming courses for 15 years, and I’ll admit that occasionally my students will ask me about a particular function that I have honestly never heard of. I remember the first time this happened, a student told me he thought the SPEDIS function was the greatest thing in SAS. As a new instructor, I was a bit embarrassed I had never heard of the SPEDIS function, so at a break I asked the other three local instructors, who probably had a combined 50+ years of SAS experience. I felt a little better when none of them had heard of it either!
With so many functions available, it is easy for a new programmer to get overwhelmed. I was asked to consolidate the LONG list into a more accessible list of favorites. After putting the request out to my fellow instructors world-wide, these are our collective favorites:
Category | Function | Description |
Character | CAT | Does not remove leading or trailing blanks, and returns a concatenated character string. |
CATS | Removes leading and trailing blanks, and returns a concatenated character string. | |
CATX | Removes leading and trailing blanks, inserts delimiters, and returns a concatenated character string. | |
COMPBL | Removes multiple blanks from a character string. | |
COMPRESS | Returns a character string with specified characters removed from the original string. | |
FIND | Searches for a specific substring of characters within a character string. | |
LEFT | Left-aligns a character string. | |
LENGTH | Returns the length of a non-blank character string, excluding trailing blanks, and returns 1 for a blank character string. | |
LOWCASE | Converts all letters in an argument to lowercase. | |
PROPCASE | Converts all words in an argument to proper case. | |
SCAN | Returns the nth word from a character string. | |
SUBSTR | Extracts a substring from an argument. | |
TRANWRD | Replaces all occurrences of a substring in a character string. | |
TRIM | Removes trailing blanks from a character string, and returns one blank if the string is missing. | |
UPCASE | Converts all letters in an argument to uppercase. | |
Date & Time | DATEPART | Extracts the date from a SAS datetime value. |
INTCK | Returns the number of interval boundaries of a given kind that lie between two dates, times, or datetime values. | |
INTNX | Increments a date, time, or datetime value by a given time interval, and returns a date, time, or datetime value. | |
MDY | Returns a SAS date value from month, day, and year values. | |
MONTH | Returns the month from a SAS date value. | |
QTR | Returns the quarter of the year from a SAS date value. | |
TODAY | Returns the current date as a numeric SAS date value. | |
WEEK | Returns the week-number value. | |
WEEKDAY | From a SAS date value, returns an integer that corresponds to the day of the week. | |
YEAR | Returns the year from a SAS date value. | |
YRDIF | Returns the difference in years between two dates according to specified day count conventions; returns a person’s age. | |
Descriptive Statistics | LARGEST | Returns the kth largest nonmissing value. |
MAX | Returns the largest value. | |
MEAN | Returns the arithmetic mean (average). | |
MEDIAN | Returns the median value. | |
MIN | Returns the smallest value. | |
N | Returns the number of nonmissing numeric values. | |
NMISS | Returns the number of missing numeric values. | |
SMALLEST | Returns the kth smallest nonmissing value. | |
STD | Returns the standard deviation of the nonmissing arguments. | |
SUM | Returns the sum of the nonmissing arguments. | |
Special | INPUT | Returns the value that is produced when SAS converts an expression by using the specified informat. (Used for converting character columns to numeric) |
PUT | Returns a value using a specified format. (Used for converting numeric columns to character) | |
Truncation | CEIL | Returns the smallest integer that is greater than or equal to the argument, fuzzed to avoid unexpected floating-point results. |
FLOOR | Returns the largest integer that is less than or equal to the argument, fuzzed to avoid unexpected floating-point results. | |
INT | Returns the integer value, fuzzed to avoid unexpected floating-point results. | |
ROUND | Rounds the first argument to the nearest multiple of the second argument, or to the nearest integer when the second argument is omitted. |
Are we missing any of your favorites?
46 Comments
I have to give another vote to STRIP and COALESCE(C) as well. Also, I'm a frequent user of NOTDIGIT for checking if it's safe to perform character-to-numeric conversion using PUT.
What about the LAG function? When understood and used properly (not inside a test) it makes cross-record comparison easy, especially when used in conjunction with the retain statement, as when one needs to collapse clumps of records while testing for a break in time between the end date of one and the start date of another.
David R.
STRIP = LEFT( TRIM ( ... ) ) less is more :)
My favourites are probably STRIP() and COALESCE(), which I use mostly in PROC SQL, but CHOOSEN() and CHOOSEC() are also useful in PROC SQL instead of the CASE WHEN ELSE END when you have sequential numeric choices.
So many cool functions I've never used!
Two of my favourites have already been mentioned - GEODIST and PRXMATCH.
One of my other favourites is SOUNDEX (similar to SPEDIS); however, I have ditched it for COMPGED. If you do a lot of text-based analytics, check out my post https://communities.sas.com/t5/SAS-Communities-Library/PROC-SQL-Continued-Basic-Text-Analytics-Using-Song-Titles/ta-p/241007 where I use SOUNDEX, and then update it with COMPGED.
Now I'm curious abut UUIDGEN - something to research tomorrow at work :-)
Chris
I find myself reading lots of oddball data into SAS from text, and the scan() function really comes in handy for parsing the data in various flexible ways.
Good one to refer. Thanks for sharing! Would have wished to see index() and strip() as well in the list.
Thanks! I agree I overlooked STRIP(), but I left out INDEX on purpose... Did you know the FIND function was introduced in SAS 9, and does exactly what INDEX does, but has 2 additional arguments that allow you to make the search case insensitive and select a start position? Along with the CAT functions, I consider it to be one of the great "new" additions.
Very good choice! :-)
Some of you already mentioned MISSING and COALESCE[C].
I want to add just one more: REVERSE
For example, You can get the last word of a string:
last_word=reverse(scan(reverse(), 1))
Sorry, I have forgotten the string... ;-)
last_word=reverse(scan(reverse(text), 1))
REVERSE is cool, but did you know you can use negative numbers with the SCAN function to count from the right? So SCAN(var, -1) will give you the last word. I love that trick!
I absolutely love the SCAN function, especially its macro version (%SCAN). Combined with an iterative step, I can extract elements in a macro variable that represent key words, such as client IDs or variable names.
Maybe I'll have to do a separate post for macro functions :)
Thanks Stacey for posting this blog on favorite SAS functions! I also like the additional SAS functions posted by users.
RESOLVE is so insanely great I wrote a whole paper on it years ago:
A Better SYSIN Than SYSIN: Instream Files on Any Platform
http://www2.sas.com/proceedings/sugi30/034-30.pdf
And thanks to PROC FCMP, you can now author your own favorite SAS function:
Gee! No, GTL! Visualizing Data With The SAS Graph Template Language (featuring my DIY "GetDeltas" function)
http://support.sas.com/resources/papers/proceedings13/286-2013.pdf
Thanks, Grandpa Ted! (you have to check out his papers to get that reference...)
all of the above are my favorite! and +500 that were not mentioned, b/c I am testing all of mva and tk functions :-).
Sounds like that will be PLENTY to keep you busy for a full career :)
No love for coallesce\coallesceC? So much utility, replaces any instance of 'if not missing X then X else if not missing Y then Y... etc'.
Wes - you beat me to it, I've always found coalesce/coalescec very useful.
Yes, I was thinking of voting for that. It is invaluable in complex SQL outer joins to get default processing right. Saves a ton of obscure DATA step code.
How about strip() which combines left and trim.
Very good point... I probably should have included that one! I always mention it in class.
You mention that CAT* functions, which make string processing much easier. Two other newer functions (I started with SAS 79, so they seem newer to me) that I use a lot now are IFC and IFN.
Can't say I was doing anything with SAS 79, but I do still think of the CAT functions as new! What a great addition...
I recall this great blog post that outlined how to do fuzzy matching and the spedis function which your readers may like http://blogs.sas.com/content/sgf/2015/01/27/how-to-perform-a-fuzzy-match-using-sas-functions/
I've always loved the intnx and intck functions for financial date calculations and the Perl regular expression for character manipulation. Handy tip sheet at https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf
Great links! Thanks for sharing.
If I'm allowed more than one I'd have to choose the PRX family of functions. The ability to use Perl Regular Expressions in SAS can be a huge help if you're doing a lot of complex text processing.
Absolutely... We have a new 1/2 day Live Web class called Take Your SAS Programming Skills to the Next Level where we cover PRX functions. It was pretty popular this year at SGF. Chris, sounds like you are already a PRX expert, but maybe other readers would like to learn more and this class would be the right fit! We should have public course dates scheduled soon.
regarding PRX functions,
I would suggest all interested in these have a look at how PRX can be "built-in" in a user-defined informat with PROC FORMAT. The first doc appeared in Rick Langston's paper at http://support.sas.com/resources/papers/proceedings12/245-2012.pdf
That prompts me also, to adapt the description for the "special" function INPUT(), I think it should read
"The most amazing facility to parse text"
>>>>>--------->> any conversion you could wish for!
There probably should be something similar for PUT()
Also PRX are favorites for me and PRXMatch in particular which allows for data driven programming.
Thanks to Chris Hemedinger, I learned of the UUIDGEN() function a few years back on his blog (http://blogs.sas.com/content/sasdummy/2012/10/19/creating-a-somewhat-unique-id-using-the-uuidgen-function/), and have been happily using it ever since. I'd guess it is one of the less-known functions. lexjansen.com only comes up with two papers that mention it! http://www.lexjansen.com/search/searchresults.php?q=uuidgen
I also have Chris to thank for LOTS of things I have learned! I must have missed the UUIDGEN function post, but just checked it out. Very cool!
My favorites are INTNX and SUBSTR.
Those 2 are definitely in my top 5. Thanks, Jesse!
I like PUT and INPUT; although I always get them mixed up and have to look up the documentation to make sure I'm using the correct one.
LOL! For at least a year I had a post-it on my monitor reminding me which function to use for which type of variable conversion... You're not alone :)
Hi Adraine,
I was taught to "use the alphabet" to remember which function to use for the different variable conversion (thanks Sam!)So, using the first letters of the following words; character, numeric, input & put, put them in alphabetic order:
Character
Input
Numeric
Put
Character
The functions are in the middle, of the variable types!
So input does character to numeric and put does numeric to character.
There are probably plenty of other ways to remember it but this stuck with me.
I just remember that "numeric" is shorter than "character", and PUT is shorter than INPUT. So if you're starting with a numeric to convert to character, it's the shorter one--PUT. And if you're starting with a character to convert to numeric, it's the longer one--INPUT.
My two favorites are Missing and Compress
Ron Cody
You would know! I'll give a plug for your great book, SAS Functions by Example. Thanks, Ron!
Yep, love that book. And all of Ron's book!
I had been using compress for years to remove a character or two. But when I took Ron's "SAS Functions by Example" course, I learned the true power of this amazing function! With modifiers, COMPRESS is the most powerful text cleanup function you can use without learning PERL regular expressions. Thanks, Ron, for teaching this Jedi a new trick!
Stacey,
Thanks for a nice blog. I can understand why you didn't include it, but I have always thought the GEODIST function was very cool. For anyone who doesn't know, it returns the geodetic distance in kilometers or miles between two latitude and longitude coordinates. Also the various ZIP functions (ZIPCITY, ZIPCITYDISTANCE, ZIPFIPS, ZIPNAME, ZIPNAMEL, ZIPSTATE) can be very useful. ZIPCITYDISTANCE even gives you the distance between two zip codes.
Susan
FYI ... ZIPCITYDISTANCE gets the lat/long of the zips used in the function from the SASHELP.ZIPCODE data set and uses the same method to calculate distance as the GEODIST function. One think to watch for is that ZIPCITYDISTANCE returns miles while the default for GEODIST (as already stated is kilometers). You can also get miles from GEODIST by using the M option in the function call.
I am a big fan of the ZIP functions! I know I've had several students light up when I mentioned the ZIPCITY function.