A well-formed WHERE statement or subsetting IF can narrow down the output of your SAS DATA step. The SAS log does a good job of telling you how many records were processed by the action. For example, let's look at this simple DATA step with my "poor man's random sample", implemented with the RANUNI function:
15 data cars; 16 set sashelp.cars; 17 if ranuni(0) < 0.85; 18 run; NOTE: There were 428 observations read from the data set SASHELP.CARS. NOTE: The data set WORK.CARS has 364 observations and 15 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
The output will also be approximately 85% of the original data, but not exactly. It's random, so the result varies each time that I run it. What if I need to know exactly how many records were processed?
In SAS 9.3, I can now pick that up from the SYSNOBS automatic macro variable. For example:
%put how many = &SYSNOBS; |
Yields:
20 %put how many = &SYSNOBS; how many = 364
This new feature in 9.3 was added as a direct result of a customer suggestion. Keep 'em coming!
18 Comments
Thanks to the SAS developers for this. I found three sections in various programs that I no longer need. Do you have any code you can throw away due to &sysnobs?
Interesting and useful. However, the code doesn't seem to match the description given in the documentation:
"Contains the number of observations read from the last data set that was closed by the previous procedure or DATA step."
Contains the number of observations READ. Since the subsetting if statement reads ALL the observations, but subsets how many it outputs, this isn't a correct description.
I don't have SAS in front of me to test out your code, but assuming it's correct, the documentation needs to be reworded.
Stephen, thanks for the comment.
I think in this contrived example of mine, the last data set is WORK.CARS, which was stuffed with 364 observations. And that's what the SYSNOBS number refers to. But I agree that the doc could be clearer, and an example would be helpful. Perhaps this makes it clearer:
But if it was made too obvious, we wouldn't need SAS blogs or sites like sas-resources.com, would we?
Ha! Thanks for the link :)
I agree, your example is good and shows exactly what &SYSNOBS is doing. It is the documentation that is ambiguous.
Thanks again! -s
Interesting. I agree, looks like it's storing the number of records in &syslast. And when you output mutliple datasets in a single step, it looks like &syslast is whichever dataset was listed last on the output dataset. Log below.
I tend to stay away from &syslast and _last_, in preference to explicitly naming datasets, but this is still a nice tip to know about. Thx!
Thanks Chris,
Was really hoping &sysnobs would return the number of records that survived the where clause when called after a PROC, but seems to just return the number of records in the input dataset.
43 proc print data=sashelp.shoes;
44 where Region='Canada';
45 run;
NOTE: There were 37 observations read from the data set SASHELP.SHOES.
WHERE Region='Canada';
46 %put &sysnobs;
395
So to get the "right" answer (37), feels like I still need to make another pass of the data. Bummer.
--Q.
sounds useful &sysNobs
Please can you clear up why sql produces different values in SYSNOBS for the same select clause
select * from sashelp.class where ranuni(1)> .9
depending whether the destination is
a table(sysnobs=output count = sqlobs)
or a list(when sysnobs = the input count before the where is applied)
Here is a clip from my saslog:
29 proc sql; 30 create table cl as 31 select * from sashelp.class where ranuni(1)> .9 ; NOTE: Table WORK.CL created, with 5 rows and 5 columns. 32 %put &sysnobs; 5 33 /*create table cl as*/ 34 select * from sashelp.class where ranuni(1)> .9 ; 35 %put &sysnobs; 19 36 quit; NOTE: PROCEDURE SQL
Peter,
Always you with the trick questions! I'm going to guess that it has to do with the internals of how PROC SQL processes the SELECT without a CREATE TABLE. There is some inner magic perhaps creating an internal view of the original data, but it exists only long enough to produce an ODS output.
For more precise results, try this:
so VIEWs have a different effect when compiled.
Aother issue occurs with select into :names
partial log
SYSNOBS shows 19 but only two names were created
Seems that ranuni(1) in SQL does not always start at the same seed in SQL
Hi,Hemedinger!
I am Jacob from China ,and I am a SAS programmer.
I need you help.
This is the problem:
But the results of dataset m03 is not what I want.
I need this:
I have to deal it with double set, but I can not find the way or some materials, would you help me?
Thank you!
Jacob,
This is an interesting programming problem. You can post such questions to the forums at communities.sas.com, and I'll bet that you receive an answer fairly quickly, with a good explanation. Note that before you can post, you must create a free sas.com profile to log in.
Your question is probably best suited to the DATA step forum.
Chris
Hi Chris. Any idea if SYSNOBS will work for non-SAS data, such as DBMS (e.g. Oracle, DB2, Teradata) tables?
Yes, Marty -- I think so. I tried with an Excel file using the PCFILES engine:
And got:
Wow! About time! I always wondered why there was a SQL version but not for anything else. Unfortunately, I'm going to continue to use the %NOBS macro program, because it works on any SAS install. That is, until no one is using < 9.3.
Hi Chris
Yes you should just continue to use your own %nobs macro for this because you can do thing like this (ie can apply where clause):
%nobs(dset=sashelp.class(where=(sex='M')));
So I should not bother with &SYSNOBS
John
As SAS documentation says "Note: If the number of observations for the data set was not calculated by the previous procedure or DATA step, the value of SYSNOBS is set to -1.",
What is mean by this ??
I am thinking that "If Previous Procedure or Datastep failed to execute due to some errors,that time value of SYSOBS is set to -1.",
am i correct ??
thanks,
Sanjeev.K
Sanjeev, actually I think that the current doc is a bit misleading. This will be fixed soon, but the real behavior is "Contains the number of observations in the last data set created by a procedure
or DATA step."
So if a step doesn't create a data set, then the value is unchanged from its previous setting. My speculation: if a step creates a data set with an unknown number of records (such as a database view), then the value might be -1 (which is sort of the SAS convention for "uncounted number of records").
Pingback: How many records are in that PROC SQL result? - The SAS Dummy