Building an SQL subquery in SAS Enterprise Guide

Many SAS Enterprise Guide users practically live in the Query Builder. For those who understand their data tables, the Query Builder provides a tremendous amount of flexibility to pull and manipulate data. The Query Builder produces SQL programs behind the scenes, which translates well for database-centric work.

Sometimes a complex query requires a two-pass process. For example, suppose that you want to create a table of CARS that contains only those models with a higher-than-average price. First you need a query to calculate the average price across all cars. Then you need a second query to filter the records with a price that's higher than that result.

A SAS Enterprise Guide user might build something like this:

This method has obvious downsides: it requires two passes through the data, and two sets of results are brought into your SAS session. If the source table is in a database, this can be especially inefficient.

SAS Enterprise Guide 5.1 supports the idea of subqueries, a concept familiar to SQL programmers. With a subquery (sometimes called "subselect" or "inner query"), you can express the entire operation in one SQL step. That's one trip to the database, and only one set of results brought back to SAS.

I'll briefly describe how this works. For comprehensive documentation and examples about the subquery feature in SAS Enterprise Guide 5.1 and later, read Michael Burke's excellent conference paper on Finding Your Inner Query with SAS Enterprise Guide.

The subquery feature relies on the ability to define query templates (also new in SAS Enterprise Guide 5.1), which are query definitions that you can reuse as building blocks for other query steps. (Query templates are useful on their own, as a way to store a query within your task menus so that you can use it across different projects.)

Using the subquery feature

First, use the Query Builder to define the "inner query". In my example, that query contains just one item, the calculated average of MSRP.

Instead of running the query, use the drop-down arrow on the Run button to select Create Template. The Create Template window allows you to assign a name to this definition, and it's automatically added to your Tasks->Task Templates menu. Note that you can create the query template without even running this part of the query. That can be a time saver if you're a Query Builder Ninja with high confidence in your result.

Next, design the outer query with a new Query Builder task. Select all of the fields that you want to include in the result. In the Filter tab, add a new filter for MSRP. Instead of comparing MSRP to a constant or other expression, you'll select from a list of available Subquery templates.

In your Filter tab, you'll see a partial view of the inner SELECT expression.

When you complete your selections and have the other query options just the way that you want them, click Run to execute the query.

If you examine the SAS program that the task creates, you'll see the inner SELECT clause:

PROC SQL;
   CREATE TABLE WORK.expensive AS 
   SELECT t1.Make, 
          t1.Model, 
          t1.MSRP, 
      FROM SASHELP.CARS t1
      WHERE t1.MSRP >= (
         SELECT /* AVG_of_MSRP */
           (AVG(t1.MSRP)) AS AVG_of_MSRP
             FROM SASHELP.CARS t1
      );
QUIT;

You can also use subqueries as a way to filter/compare a column to a list of values. In that case, your inner query simply needs to have a single-column list of results (instead of a single number, as in my example). Read Michael's subquery paper for all of the possibilities and limitations. Here's an executive summary from Michael's paper:

Here are the types of subqueries that you can create using query-based templates:

  • Subqueries that return a single value
  • Subqueries that return multiple values (multiple rows of a single field)
  • Subqueries that appear as part of a filter of the raw data (on the WHERE clause)
  • Subqueries that appear as part of a filter on the grouped data (on the HAVING clause)
  • Subqueries that appear as part of a recode condition as part of recoding a column in a computed column (on the SELECT clause)

Here are the types of subqueries that you cannot create using query-based templates:

  • Subqueries that form a derived table (subqueries that appear on the FROM clause)
  • Subqueries that refer to columns on the outer query (correlated subqueries)

More than one way to skin this data

There are many other ways to accomplish this simple example in SAS. For example, you may be able to use the HAVING clause feature in the Query Builder, or use SQL joins and special join conditions to filter a table based on another result set. But the ability to create a subquery has been long-requested from SAS Enterprise Guide users who understand databases. I hope you'll agree that it's an important addition to your query and reporting toolbox.

tags: SAS Enterprise Guide, sql, subquery

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>