The SAS language provides syntax that enables you to quickly specify a list of variables. SAS statements that accept variable lists include the KEEP and DROP statements, the ARRAY statement, and the OF operator for comma-separated arguments to some functions. You can also use variable lists on the VAR statements and MODEL statements of analytic procedures.
This article describes six ways to specify a list of variables in SAS. There is a section in the SAS documentation that describes how to construct lists, but this blog post provides more context and a cut-and-paste example for every syntax. This article demonstrates the following:
- Use the _NUMERIC_, _CHARACTER_, and _ALL_ keywords to specify variables of a certain type (numeric or character) or all types.
- Use a single hyphen (-) to specify a range of variables that have a common prefix and a sequential set of numerical suffixes.
- Use the colon operator (:) to specify a list of variables that begin with a common prefix.
- Use a double-hyphen (--) to specify a consecutive set of variables, regardless of type. You can also use a variation of this syntax to specify a consecutive set of variables of a certain type (numeric or character).
- Use the OF operator to specify variables in an array or in a function call.
- Use macro variables to specify variables that satisfy certain characteristics.
Some companies might discourage the use of variable lists in production code because automated lists can be volatile. If the number and names of variables in your data sets occasionally change, it is safer to manually list the variables that you are analyzing. However, for developing code and constructing examples, lists can be a huge time saver.
Use the _NUMERIC_, _CHARACTER_, and _ALL_ keywords
You can specify all numeric variables in a data set by using the _NUMERIC_ keyword. You can specify all character variables by using the _CHARACTER_ keyword. Many SAS procedures use a VAR statement to specify the variables to be analyzed. When you want to analyze all variables of a certain type, you can use these keywords, as follows:
/* compute descriptive statistics of allnumeric variables */ proc means data=Sashelp.Heart nolabels; var _NUMERIC_; /* _NUMERIC_ is the default */ run; /* display the frequencies of all levels for all character variables */ proc freq data=Sashelp.Heart; tables _CHARACTER_; /* _ALL_ is the defaul */ run; |
One of my favorite SAS programming tricks is to use these keywords in a KEEP or DROP statement (or data set option). For example, the following statements create a new data set that contains all numeric variables and two character variables from the Sashelp.Heart data:
data HeartNumeric; set Sashelp.Heart(keep=_NUMERIC_ /* all numeric variables */ Sex Smoking_Status); /* two character variables */ run; |
An example of using the _ALL_ keyword is shown in the section that discusses the OF operator.
Use a hyphen to specify numerical suffixes
In many situations, variables are named with a common prefix and numerical suffix. For example, financial data might have variables that are named Sales2008, Sales2009, ..., Sales2017. In simulation studies, variables often have names such as X1, X2, ..., X50. The hyphen enables you to specify the first and last variable in a list. The first example can be specified as Sales2008-Sales2017. The second example is X1-X50.
The following DATA step creates 10 variables, including the variables x1-x6. Notice that the data set variables are not in alphanumeric order. That is okay. The syntax x1-x6 will select the six variables x1, x2, x3, x4, x5, and x6 regardless of their physical order in the data. The call to PROC REG uses the six variables in a linear regression:
data A; retain Y x1 x3 Z x6 x5 x2 W x4 R; /* create 10 variables and one observation. Initialize to 0 */ run; proc reg data=A plots=none; model Y = x1-x6; run; |
The parameter estimates from PROC REG are displayed in the order that you specify in the MODEL statement. However, if you use the SET statement in a DATA step, the variables appear in the original order unless you intentionally reorder the variables:
data B; set A(keep=x1-x6); run; |
Use the colon operator to specify a prefix
If you want to use variables that have a common prefix but have a variety of suffixes, you can use the colon operator (:), which is a wildcard character that matches any name that begins with a specified prefix. For example, the following DATA step creates a data set that contains 10 variables, including five variables that begin with the prefix 'Sales'. The subsequent DATA step drops the variables that begin with the prefix 'Sales':
data A; retain Sales17 Y Sales16 Z SalesRegion Sales_new Sales1 R; /* 1 obs. Initialize to 0 */ run; data B; set A(drop= Sales: ); /* drop all variables that begin with 'Sales' */ run; |
Use a double-hyphen to specify consecutive variables
The previous sections used wildcard characters to match variables that had a specified type or prefix. In the previous sections, you will get the same set of variables regardless of how they might be ordered in the data set. You can use a double-hyphen (--) to specify a consecutive set of variables. The variables you get depend on the order of the variables in the data set.
data A; retain Y 0 x3 2 C1 'A' C2 'BC' Z 3 W 4 C4 'D' C5 'EF'; /* Initialize eight variables */ run; data B; set A(keep=x3--C4); run; |
In this example, the data set B contains the variables x3, C1, C2, Z, W, and C4. If you use the double-hyphen to specify a list, be sure that you know the order of the variables and that this order is never going to change. If the order of the variables changes, your program will behave differently.
You can also specify all variables of a certain type within a range of variables. The syntax Y-numeric-Z specifies all numeric variables between Y and Z in the data set. The syntax Y-character-Z specifies all character variables between Y and Z. For example, the following call to PROC CONTENTS displays the variables (in order) in the Sashelp.Heart data. The call to PROC LOGISTIC specifies all the numeric variables between (and including) the AgeCHDiag variable and the Smoking variable:
proc contents data=Sashelp.Heart order=varnum ; run; proc logistic data=Sashelp.Heart; model status = AgeCHDdiag-numeric-Smoking; ods select ParameterEstimates; run; |
Arrays and the OF operator
You can use variable lists to assign an array in a SAS DATA step. For example, the following program creates a numerical array named X and a character array named C. The program finds the maximum value in each row and puts that value into the variable named rowMaxNUm. The program also creates a variable named Str that contains the concatenation of the character values for each row:
data Arrays; set sashelp.Class; array X {*} _NUMERIC_; /* X[1] is 1st var, X[2] is 2nd var, etc */ array C {*} _CHARACTER_; /* C[1] is 1st var, C[2] is 2nd var, etc */ /* use the OF operator to pass values in array to functions */ rowMaxNum = max(of x[*]); /* find the max value in this array (row) */ length Str $30; call catx(' ', Str, of C[*]); /* concatenate the strings in this array (row) */ keep rowMaxNum Str; run; proc print data=Arrays(obs=4); run; |
You can use the OF operator directly in functions without creating an array. For example, the following program uses the _ALL_ keyword to output the "complete cases" for the Sashelp.Heart data. The program drops any observation that has a missing value for any variable:
data CompleteCases; set Sashelp.Heart; if cmiss(of _ALL_)=0; /* output only complete cases for all vars */ run; |
Use macro variables to specify a list
The previous sections demonstrate how you can use syntax to specify a list of variables to SAS statements. In contrast, this section describes a technique rather than syntax. It is sometimes the case that the names of variables are in a column in a data set. There might be other columns in the data set that contain characteristics or statistics for the variables. For example, the following call to PROC MEANS creates an output data set (called MissingValues) that contains columns named Variable and NMiss.
proc means data=Sashelp.Heart nolabels NMISS stackodsoutput; var _NUMERIC_; ods output Summary = MissingValues; run; proc print; run; |
Suppose you want to keep or drop those variables that have one or more missing values. The following PROC SQL call creates a macro variable (called MissingVarList) that contains a space-separated list of all variables that have at least one missing value. This technique has many applications and is very powerful.
/* Use PROC SQL to create a macro variable (MissingVarList) that contains the list of variables that have a property such as missing values */ proc sql noprint; select Variable into :MissingVarList separated by ' ' from MissingValues where NMiss > 0; quit; %put &=MissingVarList; |
MISSINGVARLIST=AgeCHDdiag Height Weight MRW Smoking AgeAtDeath Cholesterol |
You can now use the macro variable in a KEEP, DROP, VAR, or MODEL statement, such as KEEP=&MissingVarList;
Summary
This article shows six ways to specify a list of variables to SAS statements and functions. The SAS syntax provides keywords (_NUMERIC_, _CHARACTER_, and _ALL_) and operators (hyphen, colon, and double-hyphen) to make it easy to specify a list of variables. You can use the syntax in conjunction with the OF operator to pass a variable list to some SAS functions. Lastly, if the names of variables are stored in a column in a data set, you can use the full power of PROC SQL to create a macro variable that contains variables that satisfy certain criteria.
Do you use shorthand syntax to specify lists of variables? Why or why not? Leave a comment.
WANT MORE GREAT INSIGHTS MONTHLY? | SUBSCRIBE TO THE SAS TECH REPORT
13 Comments
Great post Rick with excellent examples! I love using these techniques in SAS code and will use the colon prefix operator for dropping temporary variables as you've shown. I tend to create the temporary variables with an underscore so that I can easily drop them using the syntax, drop _: ;
Your post reminds me of Mark Jordan's blog post on Variable Lists by Text Pattern which your readers may also like to see https://blogs.sas.com/content/sastraining/2017/08/29/sas-variable-lists-by-pattern/
One thing I have not seen and figured out on my own is that passing string of text into a macro (variable heading) is the same as doing a let statement and passing that variable name of a string.
%let varname = one two three;
%macroname(&varname);
and
%macroname(one two three);
both work.
But the second one is less typing!
Hi. Here's another idea for creating a macro variable with name of variables with at least one missing value. It's handy if you also want to look at CHARACTER variables.
proc freq data=sashelp.heart nlevels;
tables _all_ / noprint;
ods output nlevels=misstables (where=(nmisslevels ne 0));
run;
proc sql noprint;
select tablevar into :missingvarlist separated by ' ' from misstables;
quit;
I learned long ago to never use these short cuts for variable names. Why? When you are trying to debug programs being able to search on each variable name is critical to tracking down problems. If a problem var does not show up, you can't find it, and good luck with the issues. I also endeavor to always keep a leading and trailing space at each variable name and operator. Search commands vary from editor to editor. And I do use several different editors or other programs to edit SAS code. A variable name with a comma could be considered different from a variable name space comma. Also when trying to read programs written by others, those invisible variables make it much harder to follow the program logic. The extra typing, cutting and pasting time is more than offset during the debugging or future editing phase.
Another tip, I always use UPPER CASE for SAS COMMANDS, FUNCTIONS, and OPERATORS. And lower case or CamelFont for variable names. Using all lower case makes it harder to read programs and follow the logic. Many SAS procs use CamelFont to split column headers to make your output much more readable. Many programmers use or read programs written in many different languages where keywords vary. Using this method makes it easier to read and understand programs in languages you aren't familiar with.
The colon modifier is great for specifying variables with the same prefix:
drop foo_: ;
Perhaps one day we will see this syntax for specifying variable with the same suffix?
drop :_bar ;
Seems like an obvious omission to me.
(I know the workarounds...I just wish the above worked)
Pingback: Top posts from The DO Loop in 2018 - The DO Loop
Pingback: Use regular expressions to specify variable names in SAS - The DO Loop
Hi, thanks for the nice, well explained article, I have learnt new tricks on the array.
Good info, thank you.
SAS is such a disappointment. You have to do all these unintuitive coding gymnastics to simply say a variable is a character.
Thanks for writing. To declare that a character variable that holds, say, 20 characters, simply put a dollar sign ($) after the name on the LENGTH statement, such as
length MyCharVar $20; /* allocate space for 20 chars */
You can declare multiple character variables by listing several names:
length A B C D $20;
You can declare multiple variables that have a common prefix by using
length C1-C10 $20;
For help with elementary SAS programming, you can ask questions at the SAS Support Communities: communities.sas.com
Thanks, Rick! Is it possible to declare all character variables to be 20 characters? length _CHARACTER_ $20; doesn't seem to work.
Yes, but the syntax depends on how the data set is created. Post sample data and code to the SAS Support Communities.