In SAS, the reserved keyword _NULL_ specifies a SAS data set that has no observations and no variables. When you specify _NULL_ as the name of an output data set, the output is not written. The _NULL_ data set is often used when you want to execute DATA step code that displays a result, defines a macro variable, writes a text file, or makes calls to the EXECUTE subroutine. In those cases, you are interested in the "side effect" of the DATA step and rarely want to write a data set to disk. This article presents six ways to use the _NULL_ data set. Because the _NULL_ keyword is used, no data set is created on disk.
#1. Use SAS as a giant calculator
You can compute a quantity in a DATA _NULL_ step and then use the PUT statement to output the answer to the SAS log. For example, the following DATA step evaluates the normal density function at x-0.5 when μ=1 and σ=2. The computation is performed twice: first using the built-in PDF function and again by using the formula for the normal density function. The SAS log shows that the answer is 0.193 in both cases.
data _NULL_; mu = 1; sigma = 2; x = 0.5; pdf = pdf("Normal", x, mu, sigma); y = exp(-(x-mu)**2 / (2*sigma**2)) / sqrt(2*constant('pi')*sigma**2); put (pdf y) (=5.3); run; |
pdf=0.193 y=0.193 |
#2. Display characteristics of a data set
You can use a null DATA step to display characteristics of a data set. For example, the following DATA step uses the PUT statement to display the number of numeric and character variables in the Sashelp.Class data set. No data set is created.
data _NULL_; set Sashelp.Class; array char[*] $ _CHAR_; array num[*] _NUMERIC_; nCharVar = dim(char); nNumerVar = dim(num); put "Sashelp.Class: " nCharVar= nNumerVar= ; stop; /* stop processing after first observation */ run; |
Sashelp.Class: nCharVar=2 nNumerVar=3 |
You can also store these values in a macro variable, as shown in the next section.
#3. Create a macro variable from a value in a data set
You can use the SYMPUT or SYMPUTX subroutines to create a SAS macro variable from a value in a SAS data set. For example, suppose you run a SAS procedure that computes some statistic in a table. Sometimes the procedure supports an option to create an output data that contains the statistic. Other times you might need to use the ODS OUTPUT statement to write the table to a SAS data set. Regardless of how the statistic gets in a data set, you can use a DATA _NULL_ step to read the data set and store the value as a macro variable.
The following statements illustrate this technique. PROC MEANS creates a table called Summary, which contains the means of all numerical variables in the Sashelp.Class data. The ODS OUTPUT statement writes the Summary table to a SAS data set called Means. The DATA _NULL_ step finds the row for the Height variable and creates a macro variable called MeanHeight that contains the statistic. You can use that macro variable in subsequent steps of your analysis.
proc means data=Sashelp.Class mean stackods; ods output Summary = Means; run; data _NULL_; set Means; /* use PROC CONTENTS to determine the columns are named Variable and Mean */ if Variable="Height" then call symputx("MeanHeight", Mean); run; %put &=MeanHeight; |
MEANHEIGHT=62.336842105 |
For a second example, see the article "What is a factoid in SAS," which shows how to perform the same technique with a factoid table.
#4. Create macro variable from a computational result
Sometimes there is no procedure that computes the quantity that you want, or you prefer to compute the quantity yourself. The following DATA _NULL_ step counts the number of complete cases for the numerical variables in the Sashelp.Heart data. It then displays the number of complete cases and the percent of complete cases in the data. You can obtain the same results if you use PROC MI and look at the MissPattern table.
data _NULL_; set Sashelp.Heart end=eof nobs=nobs; NumCompleteCases + (nmiss(of _NUMERIC_) = 0); /* increment if all variables are nonmissing */ if eof then do; /* when all observations have been read ... */ PctComplete = NumCompleteCases / nobs; /* ... find the percentage */ put NumCompleteCases= PctComplete= PERCENT7.1; end; run; |
NumCompleteCases=864 PctComplete=16.6% |
#5. Edit a text file or ODS template "on the fly"
This is a favorite technique of Warren Kuhfeld, who is a master of writing a DATA _NULL_ step that modifies an ODS template. In fact, this technique is at the heart of the %MODSTYLE macro and the SAS macros that modify the Kaplan-Meier survival plot.
Although I am not as proficient as Warren, I wrote a blog post that introduces this template modification technique. The DATA _NULL_ step is used to modify an ODS template. It then uses CALL EXECUTE to run PROC TEMPLATE to compile the modified template.
#6. A debugging tool
All the previous tips use _NULL_ as the name of a data set that is not written to disk. It is a curious fact that you can use the _NULL_ data set in almost every SAS statement that expects a data set name!
For example, you can read from the _NULL_ data set. Although reading zero observations is not always useful, one application is to check the syntax of your SAS code. Another application is to check whether a procedure is installed on your system. For example, you can run the statements PROC ARIMA data=_NULL_; quit; to check whether you have access to the ARIMA procedure.
A third application is to use _NULL_ to suppress debugging output. During the development and debugging phase of your development, you might want to use PROC PRINT, PROC CONTENTS, and PROC MEANS to ensure that your program is working as intended. However, too much output can be a distraction, so sometimes I direct the debugging output to the _NULL_ data set where, of course, it magically vanishes! For example, the following DATA step subsets the Sashelp.Cars data. I might be unsure as to whether I created the subset correctly. If so, I can use PROC CONTENTS and PROC MEANS to display information about the subset, as follows:
data Cars; set Sashelp.Cars(keep=Type _NUMERIC_); if Type in ('Sedan', 'Sports', 'SUV', 'Truck'); /* subsetting IF statement */ run; /* FOR DEBUGGING ONLY */ %let DebugName = Cars; /* use _NULL_ to turn off debugging output */ proc contents data=&DebugName short; run; proc means data=&DebugName N Min Max; run; |
If I don't want to this output (but I want the option to see it again later), I can modify the DebugName macro (%let DebugName = _NULL_;) so that the CONTENTS and MEANS procedures do not produce any output. If I do that and rerun the program, the program does not create any debugging output. However, I can easily restore the debugging output whenever I want.
Summary
In summary, the _NULL_ data set name is a valuable tool for SAS programmers. You can perform computations, create macro variables, and manipulate text files without creating a data set on disk. Although I didn't cover it in this article, you can use DATA _NULL_ in conjunction with ODS for creating customized tables and reports.
What is your favorite application of using the _NULL_ data set? Leave a comment.
14 Comments
For data driven reporting, in which character values may have different lengths, I use data _null_ to determine variable lengths and use symput to create a macro variable of the variable length for use in a subsequent data step’s length statement. This is handy when constructing personalized report cards, for example, to prevent trailing blanks in output for a person with the name Sid when the name field has a length of 15. Love Data _null_!
Thanks for sharing, Louise. I know you've written papers and presented talks about SAS reporting. Do you have a favorite paper that demonstrates the power of DATA _NULL_ in the context of reporting?
Great post, Rick!
I also use DATA _NULL_ for programmatically writing various external files. For example, in this blog post Let SAS write batch scripts for you, I use DATA _NULL_ for writing out Unix/Windows script files; in this blog post series Google maps, I use DATA _NULL_ to write out HTML files dynamically embedding in them JavaScript code responsible for displaying Google maps in the web pages.
I use _NULL_ to generate macro code to read an Excel sheet based on a set of required columns and types and those actually present in the sheet. This avoids some of the problems with empty cells causing SAS to read a numeric column as character.
Great post, Rick, and thanks for the plug! DATA _NULL_ has many uses. One of my favorites uses is in the early processing in macros. Macros can be used to generate statement fragments or statements with some step. That is NOT my application. When a macro is used to generate a full step or a series of steps, you might want to do some preprocessing. For example, you might want to check the integrity of the arguments before doing anything else. This is where DATA _NULL_ comes in. It is much easier to use DATA _NULL_, SYMGET, DATA step statements, and CALL SYMPUTX rather than write a series of %IF, %ELSE, %LET, and other macro statements. If you are going to process values in the macro language over which you have no control, you must quote them to avoid undesirable effects. The code can get complicated. That is never an issue with SYMGET, DATA step statements, and CALL SYMPUTX. A good heuristic--and I am certainly not the first person to say this--is to use the macro language as little as possible. Instead, process and check the values of macro variables in a DATA _NULL_ step.
Yes, I have learned this through sometimes bitter experience. I encountered one person who avoided all macro processing by using a data step to write code to a file and then calling it with %include. Debugging is simpler. Of course you can queue it with execute.
@Rick: A 7th use for data _null_ is when using proc sort to ONLY output records that DON't have duplicates. e.g.:
data have;
set sashelp.class;
id=_n_;
output;
if name ne 'John' then output;
run;
proc sort data=have out=_null_ nouniquekey noequals UNIQUEOUT=want;
by id;
run;
There likely are other similar uses related to procs, but the above example is the first time I'd ever seen it used that way.
Thanks, Art. And since Art has publically "outed" himself, I will mention that he was the person who began a SAS-L thread that inspired me to write this article. Art's example, and the many participants who replied, "I did not know that!" convinced me that this feature deserved to be advertised more widely.
Use a data set to generate CALL Execute statements in data _null_.
PUTting values into obscure formats required as input to various types of hardware or software (printer mark up codes for example in the days of SAS5)
The following code using _null_ to get marco variable &sqlobs without print the result .
It is from Arthur Carpenter .
proc sql;
create table _null_ as
select * from sashelp.class
where sex='F';
quit;
%put the number of female is &sqlobs;
An obscure and non-production way to use DATA _NULL_ is through what are called OUTPUT DATA step views. I did a paper on this topic back in 2011. I have not tested the feature for some time but *assume" it is still present in SAS:
https://www.lexjansen.com/wuss/2011/coders/Papers_Billings_T_73653.pdf
DATA _NULL_ steps don't -by themselves - create output SAS data sets. However hash objects can be used in _NULL_ steps to create output data sets.
These are great, and I once considered "how would you use data _null_?" as an hiring question to test SAS proficiency.
I use _NULL_ to call a web service that emails me a notification that my long-running SAS program is done. It's just four, short lines of SAS code that passes the name of the program to a special URL. While the functionality is limited, it avoids the hassles associated with MAPI/SMTP.
Great!
Can any one help me get the count of repeated characters in the given String. For eg string = 'REPEATED'.