Adventures of a SAS detective and the fantastic FIND function

0

Did I trick you into seeing what this blog is about with its mysterious title? I am going to talk about how to use the FIND function to search text values.

The FIND function searches for substrings in character values. For example, you might want to extract all email addresses ending in .edu from a list of email addresses. If you are a slightly older SAS programmer like me, you may be more familiar with the INDEX function. If you use only two arguments in the FIND function, the first being the string you are searching and the second being the substring you are looking for, the FIND function is identical to the INDEX function. Both of these functions will searczh the string (first argument) for the substring (second argument) and return the position where the substring starts. If the substring is not found, the function returns a zero.

The newer FIND function has several advantages over the older INDEX function. These advantages are realized by the optional third and fourth arguments to the FIND function. These two arguments allow you to specify a starting position for the search and modifiers that allow you to ignore case. You can use either of these two arguments, or both, and the order doesn't matter! How is this possible? The value for the starting position is always a numeric value and the value for the modifier is always a character value. Thus, SAS can always figure out if a value is a starting position or a modifier.

Let's look at an example

Suppose you have a SAS data set called Emails, and each observation in the data set contains a name and an email address.

Here is a listing of the SAS data set Emails:

You want to select all observations where the variable Email_Address contains .edu (ignoring case).

The program below does just that:

*Searching for .edu;
data Education;
   set Emails;
   if find(Email_Address,'.edu','i') then output;
run;
title "Listing of Data Set Education";
proc print data=Education noobs;
run;

The 'i' modifier is an instruction to ignore case. In the listing of Education below, notice that all the .edu addresses are listed, regardless of case.

Not only is the FIND function more flexible than the older INDEX function, the ignore case modifier is really handy.

For more tips on writing code and how to get started in SAS Studio, check out my book, Learning SAS by Example: A Programmer’s Guide, Second Edition. You can also download a free book excerpt. To also learn more about SAS Press, check out the up-and-coming titles and receive exclusive discounts, make sure to subscribe to the SAS Books newsletter.

Share

About Author

Ron Cody

Private Consultant

Dr. Ron Cody was a Professor of Biostatistics at the Rutgers Robert Wood Johnson Medical School in New Jersey for 26 years. During his tenure at the medical school, he taught biostatistics to medical students as well as students in the Rutgers School of Public Health. While on the faculty, he authored or co-authored over a hundred papers in scientific journals. His first book, Applied Statistics and the SAS Programming Language, was first published by Prentice Hall in 1985 and is now in its fifth edition. Since then, he has published over a dozen books on SAS programming and statistical analysis using SAS. His latest book, A Gentle Introduction to Statistics Using SAS Studio was published this year. Ron has presented numerous papers at SAS Global forums, regional conferences, as well as local user groups. He is presently a contract instructor for SAS Institute and continues to write books on SAS and statistical topics.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top