Can you actually get something for nothing? With PROC SQL’s subquery and remerging features, yes, you can.
Often there is a need to add group descriptive statistics such as group counts, minimum and maximum values for further by-group processing. Instead of first creating the group count, minimum or maximum values and then merging the summarized dataset to the original dataset, why not take advantage of PROC SQL to complete two steps in one? With PROC SQL’s subquery and summary functions by the group variable, you can easily remerge the new group descriptive statistics back to the original dataset.
By selecting all variables from the original data set and applying a LEFT JOIN, we are preserving the original dataset. In the subquery, an internal temporary dataset is created based on selecting both the linking variable and the new group descriptive statistics variable. The linking variable uniquely identifies records. A summary function is applied to the linking variable with the GROUP BY clause. Conditions can be applied as needed, for example, to exclude missing values.
The advantages of applying subqueries within remerge techniques is that the original dataset does not change in data content or record sort order. Without subqueries, however, the GROUP BY clause is applied with the wildcard ‘*’ which selects all variables. In general, this approach should be avoided since it may incorrectly cause values to be summarized or records to be rearranged. This method may be more appropriate for selected limited number of variables in the original dataset.
proc sql; create table class2 as select a.*, b.sexn /* add group count variable */ from sashelp.class as a left join /* keep all records */ (select name, count(name) as sexn /* create group count variable */ from sashelp.class where sex > ' ' /* apply condition as needed */ group by sex /* group by variable */ ) as b on a.name=b.name; /* link by detail variable */ quit;
As can be seen, this technique can be duplicated for multiple group descriptive statistics. In the second example, several variables are added - SEX, AGEN and AGEMX. The first subquery is grouped by SEX and the second subquery is grouped by AGE. Without the need for a second DATA step or creating a second dataset, you can now quickly add group descriptive statistics and move forward to the next by-group processing step.
proc sql; create table class3 as select a.*, b.sexn, c.agen, c.agemx /* add group count variables */ from sashelp.class as a left join (select name, count(name) as sexn /* first subquery */ from sashelp.class where sex > ' ' group by sex ) as b on a.name=b.name left join /* second subquery */ (select name, count(age) as agen, max(age) as agemx /* Count & Max */ from sashelp.class where age > . group by age ) as c on a.name=c.name ; quit;
These code examples are provided as is, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.