When developing SAS® data sets, program code and/or applications, efficiency is not always given the attention it deserves, particularly in the early phases of development. Since data sizes and system performance can affect a program and/or an application’s behavior, SAS users may want to access information about a data set’s content and size. To access, for example, how much disk space a data set is using, users can make a few calculations and/or learn how to access metadata content to obtain the information to this important information. This tip explores a few ways to determine, or estimate, the size of a data set – a question many users are curious about when discussing SAS performance and tuning techniques. You'll find more tips like this one in my book, PROC SQL: Beyond the Basics Using SAS, Second Edition.
Using PROC SQL and DICTIONARY.TABLES
The SAS System collects valuable information (known as “metadata”) about all known SAS libraries, data sets (tables), catalogs, indexes, macros, system options, views and a collection of other “read-only” tables called Dictionary tables and SASHELP views. One specific Dictionary table, TABLES, and its SASHELP view equivalent, VTABLE, contains details about a SAS session’s data sets. In the following PROC SQL code, the specification of a PROC SQL SELECT-clause is illustrated to access the contents of four columns found in the TABLES Dictionary table, specifically LIBNAME, MEMNAME, MEMTYPE and FILESIZE to display the size of the CARS data set.
PROC SQL and Dictionary.TABLES:
PROC SQL ; TITLE ‘Filesize for CARS Data Set’ ; SELECT LIBNAME, MEMNAME, FILESIZE FORMAT=SIZEKMG., FILESIZE FORMAT=SIZEK. FROM DICTIONARY.TABLES WHERE LIBNAME = ‘SASHELP’ AND MEMNAME = ‘CARS’ AND MEMTYPE = ‘DATA’ ; QUIT ;
As shown in the results, above, the CARS data set filesize is 192KB. Note: When the SIZEKMG. format is specified in a format= option, SAS determines whether to apply KB for kilobytes, MB for megabytes, or GB for gigabytes; and divides the numeric filesize value by one of the following values:
Using PROC PRINT and SASHELP.VTABLE
In the next example, the specification of a PROC PRINT is illustrated to access the contents of three columns found in the VTABLE SASHELP view, specifically LIBNAME, MEMNAME and FILESIZE to display the size of the CARS data set.
PROC PRINT and SASHELP.VTABLE
PROC PRINT DATA=SASHELP.VTABLE NOOBS ; VAR LIBNAME MEMNAME FILESIZE ; WHERE LIBNAME = ‘SASHELP’ AND MEMNAME = ‘CARS’ ; FORMAT FILESIZE SIZEKMG. ; TITLE ‘Filesize for SASHELP.CARS Data Set’ ; RUN ;
Using DATA _NULL_, SASHELP.VEXTFL and CALL SYMPUTX
In the final example, a DATA _NULL_ is illustrated to access the contents of the VEXTFL SASHELP view with a FILENAME statement. An assignment statement is specified to calculate the FILESIZE value for the size of the CARS data set. The CALL SYMPUTX left justifies and trims the trailing blanks from the numeric FILESIZE value of 196608.
filename myfile 'C:\Program Files\SAS9.4\SASFoundation\9.4\\CORE\SASHELP\Cars.sas7bdat' ; DATA _NULL_ ; SET SASHELP.VEXTFL (WHERE=(FILEREF=’MYFILE’)) ; /* Calculate the Filesize in MB */ FILESIZE = FILESIZE / (1024 ** 2) ; CALL SYMPUTX (‘FILESIZE’,FILESIZE) ; RUN ;
Want more SAS tips? Check out these useful tips and tricks from several other subject matter experts.
When I tried to use Filesize to calculate the size of the compressed datasets on a library(not SASHELP) using dictionary.tables I got an error in logs stating variable Filesize does not exist !!
Can you please help me on this?
As usual Paul has a new and very useful stuff to offer to SAS community. I am SAS user for over 20 years but I didn't know for SIZEKMG format. Shame on me.
Thank you Paul,!!!!. This is very handy.
Great tip, thank you, Kirk. Interestingly, the filesize column in sashelp.vextfl does not have units of measure specified, not even in the label. I assume it is Bytes. Correct?
Thank you. This is very handy. It suggests that with the correctly constructed LIBNAME statement that concatenates a series of paths, one could construct a schema report for all the model inputs SAS knows about back to the system of record inputs for not just file size, but attributes like datatype, names, formats, informats and so on.
That would be handy for everyone.
Very useful tips, thanks for sharing. As far as I can remember, SIZEKMG format was pre-production back in 9.1 then went into production status with 9.2 onwards. It's only limitation (I know) is - by definition - that it's capped to (ca) 1 Terabyte (exactly1.99999...x(2^^39) rounded to 1024Gb ~ 1 Tb). Now that data sometimes outsize this limit, sometimes custom "SIZEKMGT" Picture formats must be re-defined ...
The SAS-supplied format is so useful! I will make sure to incorporate this in my work. This will also help in "checksum" diagnostics looking at files across server instances. Thanks!
I never saw the SIZEKMG format before. Very useful tip. Thanks.