How to detect SAS data sets that contain (or do not contain) character variables

1

A SAS programmer posted an interesting question on a SAS discussion forum. The programmer wanted to iterate over hundreds of SAS data sets, read in all the character variables, and then do some analysis. However, not every data set contains character variables, and SAS complains when you ask it to read the character variables in a data set that contains only numeric variables.

The programmer wanted to use PROC IML to solve the problem, but the issue also occurs in the SAS DATA step. The following program creates three data sets. Two of them (AllChar and Mixed) contain at least one character variable. The third data set (AllNum) does not contain any character variables. For the third data set, an error occurs if you try to use the KEEP=_CHARACTER_ data set option, as shown in the following example:

data AllNum;
   x=1; y=2; z=3;
run;
data AllChar;
   A='ABC'; B='XYZW';
run;
data Mixed;
   name='Joe'; sex='M'; Height=1.8; Weight=81; treatment='Placebo'; 
run;
 
/* try to use DROP=_CHARACTER_ to exclude numeric variables */
data KeepTheChar;
   set AllNum(keep=_CHARACTER_); /* ERROR when no character variables in the data set */
run;
   ERROR: The variable _CHARACTER_ in the DROP, KEEP, or RENAME list has never been referenced.

The same problem occurs in PROC IML if you try to read character variables when none exist:

proc iml;
use AllNum;
   read all var _CHAR_ into X; /* ERROR when no character variables in the data set */
close;
   ERROR: No character variables in the data set.

There are at least two ways to handle this situation:

  1. In both Base SAS and SAS/IML, you can use dictionary tables to determine in advance which data sets contain at least one character variable. You can then read only those data set.
  2. In SAS/IML, you can read all variables into a table, then extract the character variables into a matrix for further processing.

Of course, the same ideas apply if you want to read only numeric variables and you encounter a data set that does not contain any numeric variables.

Use DICTIONARY tables to find information about your SAS session

If you have ever been to a SAS conference, you know that DICTIONARY tables are a favorite topic for SAS programmers. DICTIONARY tables are read-only tables that provide information about the state of the SAS session, including libraries, data sets, variables, and system options. You can access them directly by using PROC SQL. If you want to access the information in the DATA steps or other procedures (like PROC IML), you can use special data views in SASHELP. In particular, the Sashelp.VColumn view provides information about variables in SAS data set and is often used to find data sets that contain certain variable names. (See the references at the end of this article for more information about DICTIONARY tables.)

The following SAS/IML program uses the Sashelp.VColumn to find out which data sets contain at least one character variable:

proc iml;
/* Solution 1: Use dictionary table sashelp.vcolumn */
/* Find data sets in WORK that have AT LEAST ONE character variable */
use sashelp.vcolumn(where=(libname="WORK" & memtype='DATA' & type='char'); /* read only CHAR variables */
   read all var {memname name};  /* memname=data set name; name=name of character variable */
close;
 
/* loop over data sets. If a set contains at least one character variable, process it */
dsName = {'AllNum' 'AllChar' 'Mixed'};       /* names of potential data sets */
do i = 1 to ncol(dsName);
   idx = loc(memname = upcase(dsName[i]));   /* is data set on the has-character-variable list? */
   /* for demo, print whether data set has character variables */
   msg = "The data set " + (dsName[i]) + " contains " + 
          char(ncol(idx)) + " character variables.";
   print msg;
 
   if ncol(idx)>0 then do;            /* the data set exists and has character vars */
      charVars = name[idx];           /* get the names of the character vars */
      use (dsName[i]);                /* open the data set for reading */
      read all var charVars into X;   /* read character variables (always succeeds) */
      close;
      /* process the data */
   end;
end;

The output shows that you can use the DICTIONARY tables to determine which data sets have at least one character variable. You can then use the USE/READ statements in PROC IML to read the character variables and process the data however you wish. As mentioned previously, this technique can also be used in PROC SQL and the DATA step.

Use SAS/IML tables to find character variables

The previous section is very efficient because only character variables are ever read into SAS/IML matrices. However, there might be situations when you want to process character variables (if they exist) and then later process numerical variables (if they exist). Although a SAS/IML matrix contains only one data type (either all numeric or all character), you can read mixed-type data into a SAS/IML table, which supports both numeric and character variables. You can then use the TableIsVarNumeric function to generate a binary indicator variable that tells you which variables in the data are numeric and which are character, as follows:

/* Solution 2: Read all data into a table. Use the TableIsVarNumeric function to determine
   which variables are numeric and which are character. */
dsName = {'AllNum' 'AllChar' 'Mixed'};             /* names of potential data sets */
do i = 1 to ncol(dsName);                          /* for each data set... */
   T = TableCreateFromDataset("WORK", dsName[i]);  /* read all variables into a table */
   numerInd = TableIsVarNumeric(T);                /* binary indicator vector for numeric vars */
   charInd = ^numerInd;                            /* binary indicator vector for character vars */
   numCharVars = sum(charInd);                     /* count of character variables in this data set */
   msg = "The data set " + (dsName[i]) + " contains " + 
         char(numCharVars) + " character variables.";
   print msg;
   if numCharVars > 0 then do;
      X = TableGetVarData(T, loc(charInd));        /* extract the character variables into X */
      /* process the data */
   end;
   /* optionally process the numeric data */
   numNumerVars = sum(numerInd);                   /* count of numeric variables in this data set */
   /* etc */
end;

The output is identical to the output in the previous section.

Summary

In summary, this article discusses a programmer who wants to iterate over many SAS data sets and process only character variables. However, some of the data sets do not have any character variables! This article shows two methods for dealing with this situation: DICTIONARY tables (available through Sashelp views) or SAS/IML tables. The first method is also available in Base SAS.

Of course, you can also use this trick to read all numeric variables when some of the data sets might not have any numeric variable. I've previously written about how to read all numeric variables into a SAS/IML matrix by using the _ALL_ keyword. If the data set contains both numeric and character variables, then only the numeric variables are read.

References

The following resources provide more information about DICTIONARY tables in SAS:

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

1 Comment

  1. Nice. You can turn off that error by using DKRICOND=nowarn. This means another possible approach is to use PROC contents, e.g.:

    options dkricond=nowarn ;
    proc contents data=AllNum(keep=_character_) out=MyVarList(keep=name);
    run ;

    That will create work.MyVarList, with one record per variable that survived the KEEP. In this case, you get a 0 record dataset. On SAS severs with lots and lots of libraries defined by default, sometimes the dictionary tables get slow to process, and PROC CONTENTS can be faster.

Leave A Reply

Back to Top