Select ODS tables by using wildcards and regular expressions in SAS

2

You might know that you can use the ODS SELECT statement to display only some of the tables and graphs that are created by a SAS procedure. But did you know that you can use a WHERE clause on the ODS SELECT statement to display tables that match a pattern? This article shows how to use wildcards, regular expressions, and pattern matching to select ODS tables in SAS.

ODS SELECT: Filter the output from SAS procedures

A SAS procedure might produce a dozen or more tables. You might be interested in displaying a subset of those tables. Recall that you can use the ODS TRACE ON statement to obtain a list of all the tables and graphs that a procedure creates. You can then use the ODS SELECT or the ODS EXCLUDE statement to control which tables and graphs are displayed.

Here's an example from the SAS/STAT documentation. The following PROC LOGISTIC call creates 27 tables and graphs, most of which are related to ROC curves. The ODS TRACE ON statement displays the names of each output object in the SAS log:

data roc;
   input alb tp totscore popind @@;
   totscore = 10 - totscore;
   datalines;
3.0 5.8 10 0   3.2 6.3  5 1   3.9 6.8  3 1   2.8 4.8  6 0
3.2 5.8  3 1   0.9 4.0  5 0   2.5 5.7  8 0   1.6 5.6  5 1
3.8 5.7  5 1   3.7 6.7  6 1   3.2 5.4  4 1   3.8 6.6  6 1
4.1 6.6  5 1   3.6 5.7  5 1   4.3 7.0  4 1   3.6 6.7  4 0
2.3 4.4  6 1   4.2 7.6  4 0   4.0 6.6  6 0   3.5 5.8  6 1
3.8 6.8  7 1   3.0 4.7  8 0   4.5 7.4  5 1   3.7 7.4  5 1
3.1 6.6  6 1   4.1 8.2  6 1   4.3 7.0  5 1   4.3 6.5  4 1
3.2 5.1  5 1   2.6 4.7  6 1   3.3 6.8  6 0   1.7 4.0  7 0
3.7 6.1  5 1   3.3 6.3  7 1   4.2 7.7  6 1   3.5 6.2  5 1
2.9 5.7  9 0   2.1 4.8  7 1   2.8 6.2  8 0   4.0 7.0  7 1
3.3 5.7  6 1   3.7 6.9  5 1   3.6 6.6  5 1
;
 
ods graphics on;
ods trace on;
proc logistic data=roc;
   model popind(event='0') = alb tp totscore / nofit;
   roc 'Albumin' alb;
   roc 'K-G Score' totscore;
   roc 'Total Protein' tp;
   roccontrast reference('K-G Score') / estimate e;
run;

The SAS log displays the names of the tables and graphs. A portion of log is shown below:

Output Added:
-------------
Name:       OddsRatios
Label:      Odds Ratios
Template:   Stat.Logistic.OddsRatios
Path:       Logistic.ROC3.OddsRatios
-------------

Output Added:
-------------
Name:       ROCCurve
Label:      ROC Curve
Template:   Stat.Logistic.Graphics.ROC
Path:       Logistic.ROC3.ROCCurve
-------------

Output Added:
-------------
Name:       ROCOverlay
Label:      ROC Curves
Template:   Stat.Logistic.Graphics.ROCOverlay
Path:       Logistic.ROCComparisons.ROCOverlay
-------------

Output Added:
-------------
Name:       ROCAssociation
Label:      ROC Association Statistics
Template:   Stat.Logistic.ROCAssociation
Path:       Logistic.ROCComparisons.ROCAssociation
-------------

Output Added:
-------------
Name:       ROCContrastCoeff
Label:      ROC Contrast Coefficients
Template:   Stat.Logistic.ROCContrastCoeff
Path:       Logistic.ROCComparisons.ROCContrastCoeff
-------------

Only a few of the 27 ODS objects are shown here. Notice that each ODS object has four properties: a name, a label, a template, and a path. Most of the time, the name is used on the ODS SELECT statement to filter the output. For example, if you want to display only the ROC curves and the overlay of the ROC curves, you can put the following statement prior to the RUN statement in the procedure:

   ods select ROCCurve ROCOverlay;     /* specify the names literally */

Use a WHERE clause in the ODS SELECT statement

Often the ODS objects that you want to display are related to each other. In the LOGISTIC example, you might want to display all the information about ROC curves. Fortunately, the SAS developers often use a common prefix or suffix, such as 'ROC', in the names of the ODS objects. That means that you can display all ROC-related tables and graphs be selecting the ODS objects whose name (or path) contains 'ROC' as a substring.

You can use the WHERE clause to select ODS objects whose name (or label or path) matches a particular pattern. The object's name is available in a special variable named _NAME_. Similarly, the object's label and path are available in variables named _LABEL_ and _PATH_, respectively. You cannot match patterns in the template string; there is no _TEMPLATE_ variable.

In SAS, the following operators and functions are useful for matching strings:

  • The CONTAINS keyword matches strings that contains a specified substring. The question mark (?) is an equivalent way to specify the CONTAINS operator.
  • The LIKE keyword matches strings to a pattern. The underscore (_) is a wildcard that matches any character. The percent sign (%) is a wildcard that matches one or more characters.
  • The "begins with" operators (=: and in:) match strings that begin with a certain pattern or set of patterns, respectively.
  • SAS functions such as FIND, INDEX, and SUBSTR can be used to match patterns.
  • The SAS PRXMATCH function, which enables you to use Perl regular expressions to match patterns.

For example, the following statements select ODS tables and graphs from the previous PROC LOGISTIC call. You can put one of these statements before the RUN statement in the procedure:

   /* use any one of the following statements inside the PROC LOGISTIC call */
   ods select where=(_name_ =: 'ROC');                 /* name starts with 'ROC' */
   ods select where=(_name_ like 'ROC%');              /* name starts with 'ROC' */
   ods select where=(_path_ ? 'ROC');                  /* path contains 'ROC' */
   ods select where=(_label_ ? 'ROC');                 /* label contains 'ROC' */
   ods select where=(_name_ in: ('Odds', 'ROC'));      /* name starts with 'Odds' or 'ROC' */
   ods select where=(substr(_name_,4,8)='Contrast');   /* name has subtring 'Contrast' at position 4 */

For additional examples of using pattern matching to select ODS objects, see Warren Kuhfeld's graphics-focused blog post and the section of the SAS/STAT User's Guide that discusses selecting ODS graphics.

Use PRXMATCH to match regular expressions

Although the CONTAIN and LIKE operators are often sufficient for selecting a table, SAS provides the powerful PRXMATCH function for more complex pattern-matching tasks. The PRXMATCH function uses Perl regular expressions to match strings. SAS provides a Perl Regular Expression "cheat sheet" that summarizes the syntax and commons search queries for the PRXMATCH function.

You can put any of the following statements inside the PROC LOGISTIC call:

   /* use any one of the following PRXMATCH expressions inside the PROC LOGISTIC call */
   ods select where=(prxmatch('/ROC/',       _name_)); /* name contains 'ROC' anywhere */
   ods select where=(prxmatch('/^ROC/',      _name_)); /* name starts with 'ROC' */
   ods select where=(prxmatch('/Odds|^ROC/', _name_)); /* name contains 'Odds' anywhere or 'ROC' at the beginning */
   ods select where=(prxmatch('/ROC/',      _name_)=0);     /* name does NOT contain 'ROC' anywhere */
   ods select where=(prxmatch('/Logistic\.ROC2/', _path_)); /* escape special wildcard character '.' */

In summary, the WHERE= option on the ODS SELECT (and ODS EXCLUDE) statement is quite powerful. Many SAS programmers know how to list the names of tables and graphs on the ODS SELECT statement to display only a subset of the output. However, the WHERE= option enables you to use wildcards and regular expressions to select objects whose names or paths match a certain pattern. This can be a quick and efficient way to select tables that are related to each other and share a common prefix or suffix in their name.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

2 Comments

  1. Peter Lancashire on

    Great reminder. Not to be confused with the WHERE dataset option in the ODS OUTPUT statement, which selects rows of data, not output objects.

  2. Pingback: How to interpret graphs in a principal component analysis - The DO Loop

Leave A Reply

Back to Top