Combine and conquer with SAS

13

Combine and conquer with SASThe ancient political maxim “divide and conquer” (Lat. “dīvide et īmpera”) has been used for millennia in politics, sociology, and psychology, mainly to manipulate people. In the last two centuries, however, this principle has also been adopted by computer science. It is widely used in computer programming, primarily in computer algorithms such as binary search, quick sort, Fast Fourier Transform (FFT), parallel processing algorithms – to name a few.

A typical Divide and Conquer algorithm consists of the following steps:

  1. Divide – break the given problem into smaller pieces (sub-problems) of the same type;
  2. Conquer – solve these smaller pieces (sub-problems);
  3. Combine – aggregate the results of the solved sub-problems to deduce the result of the initial problem.

However, in this blog post we’ll reverse the “divide and conquer” principle and go backwards from step three to step two, to combine and then conquer.

In our day-to-day programming activities, we may not need to divide a problem in order to solve it. In many cases, we deal with small pieces of a puzzle (language elements) provided by the SAS programming language, and we solve some larger problems by combining those elements.

Let’s consider some examples.

Combining SAS Formats

Suppose you have a variable x in a SAS table on which you need to report with the following requirement. Depending on the variable value, you will need to apply different SAS formats:

Define format using other formats in SAS

In other words, you need to display percentages that are greater than 10% as whole numbers, percentages between 1% and 10% as numbers with one decimal point, and percentages that are less than 1% as numbers with two decimal points.

You may get away with proc format’s picture statement with a proper multiplier – but good luck with that. Fortunately, there is a much better way of solving this problem. Did you know that you can use already existing formats as labels in proc format’s value statement? In other words, we can combine several SAS formats while defining another SAS format.

The above problem is easily solved by defining a new custom format as a combination of different PERCENTw.d formats as follows:

proc format;
   value pctmod
      -0.01 <-< 0.01 = [percent10.2] 
      -0.1 - -0.01, 0.01 - 0.1 = [percent10.1]
      low -< -0.1, 0.1 <- high = [percent10.0]
      ;
run;

Here is what this format does:

Define SAS format using other formats

You can combine any other existing formats as needed, including mixing and matching SAS-supplied formats and user-defined formats. You can even use existing formats within the PICTURE statement of the PROC FORMAT. For example, the following custom format will format dollar values between 0 and 1 as cents (e.g. 75¢) while dollar values greater than 1 as dollars (e.g. $9,340):

proc format;
   picture dolcent
	  0 <-< 1 = '99¢' (mult=100)
	  1 - high = [dollar32.0]
	  ;
run;

Here is what this format does:

Using SAS formats defining other formats

When using an existing format as a label, make sure to enclose the format name with dot in square brackets, e.g. [date9.], and do not enclose the name of the existing format in single or double quotes.

Using an existing format in the format definition can be construed as format nesting. A nested level of one means that you are creating the format A with the format B as a formatted value. However, if you also create format B with the format C as a formatted value, then you have nested level of two. Avoid nesting formats with a level higher than one, as the resource requirements may increase dramatically with each additional level.

Combining SAS Libraries

Suppose you have defined several SAS data libraries, such as

libname apples 'c:\projects\garden\apples';
libname oranges 'c:\projects\garden\oranges';
libname tomatoes 'c:\projects\garden\tomatoes';

You can then logically combine (concatenate) them into a single library:

libname combolib (apples oranges tomatoes);

or

libname combolib ('c:\projects\garden\apples' 'c:\projects\garden\oranges' 'c:\projects\garden\tomatoes');

Yes, you can combine apples with oranges, and even with tomatoes, and that will enable you to access the SAS data sets in several libraries with one libref.

As you can see, you can combine two or more libraries by their librefs, physical names, or even a combination of librefs and physical names, such as:

libname combolib (apples oranges 'c:\projects\garden\tomatoes');

SAS will even allow using commas between the concatenated libraries:

libname combolib (apples, oranges, tomatoes);
libname combolib (apples, oranges, 'c:\projects\garden\tomatoes');

It is important to know that while SAS has no problem combining multiple libraries that contain same-named files, when you reference such a file, the concatenated libraries are searched in order and the first occurrence of the specified file is used.

This proves to be extremely useful when you have multiple versions of the same tables that you promote between different environments during a system development cycle. For example, you might have a SAS data table named SALES in either some or all of the development, testing, and production environments. You can define librefs for each environment as well as a libref for the combined environment:

libname devllib 'c:\projects\proj_x\devl';
libname testlib 'c:\projects\proj_x\test';
libname prodlib 'c:\projects\proj_x\prod';
 libname lastlib (prodlib testlib devllib);

Then when you reference your table as lastlib.SALES SAS will return the SALES table from production environment if it exists there; if not then from testing environment if it exists there; if not then from development environment it exists there. That logical hierarchy means that SAS will return the latest and greatest version of that SALES table. If you want that table from a specific environment, then you reference it using a specific libref: prodlib.SALES, testlib.SALES, or devllib.SALES.

Combining SAS Format Libraries

You can combine SAS format libraries to achieve similar version prioritization as in the above case of SAS data libraries using FMTSEARCH= System Option:

options fmtsearch=(prodlib testlib devllib);

This implies that your SAS formats are stored in the catalogs named FORMATS in the above specified libraries: prodlib.FORMATS, testlib.FORMATS, devllib.FORMATS.

One must remember that if not specified, WORK and LIBRARY libraries by default take precedence in the order in which SAS formats are searched, making the above options statement effectively look like:

options fmtsearch=(work library prodlib testlib devllib);

If WORK and LIBRARY libraries are specified in the fmtsearch= then the default is not applied and the search order is as specified.

Your turn

An unlimited space below is reserved for your contribution to the fascinating topic of combine and conquer with SAS. This is your chance to contribute to the list of coding techniques that you use to combine and conquer solving your SAS coding problems.

 

Share

About Author

Leonid Batkhan

Leonid Batkhan is a long-time SAS consultant and blogger. Currently, he is a Lead Applications Developer at F.N.B. Corporation. He holds a Ph.D. in Computer Science and Automatic Control Systems and has been a SAS user for more than 25 years. From 1995 to 2021 he worked as a Data Management and Business Intelligence consultant at SAS Institute. During his career, Leonid has successfully implemented dozens of SAS applications and projects in various industries. All posts by Leonid Batkhan >>>

13 Comments

  1. Leonid,

    I really enjoyed this post and learned something new--"combining several SAS formats while defining another SAS format."

    Thanks a 1,000,000; I look forward to your next blog post!

    ----MMMMIIIIKKKKEEEE

    • Leonid Batkhan

      Thank you, Mike, for your inspiring feedback. Indeed, this is a cool feature. You can even combine several user-defined formats into one format. For example, you can combine gender and marital status into one format. Try this code out:

      proc format;
      	value $gender
      	'1' = 'Female'
      	'2' = 'Male'
      	;
      	value $marital
      	'M' = 'Married'
      	'D' = 'Divorced'
      	'S' = 'Single'
      	'W' = 'Widowed'
      	;
      	value $demogra
      	'1','2' = [$gender.]
      	other = [$marital.]
      	;
      run;
       
      data _null_;
      	do i='1','2','M','D','S','W';
      		put i= $demogra.;
      	end;
      run;
  2. Tom Gaughan on

    I definitely enjoyed the description of using proc format in this manner:

    proc format;
       value pctmod
          -0.01 < -< 0.01 = [percent10.2] 
          -0.1 - -0.01, 0.01 - 0.1 = [percent10.1]
          low -< -0.1, 0.1 <- high = [percent10.0]
          ;
    run;

    One thing, the sign for cents is a suffix, not a prefix:

    proc format;
       picture dolcent
               0 < - 1 = '99' (prefix='¢' mult=100)
               1 - high = [dollar32.0]
               ;
    run;
    • Leonid Batkhan

      Thank you, Tom for your feedback. You are absolutely correct, and I just went ahead and tweaked the code so now it looks like

      proc format;
         picture dolcent
      	  0 <-< 1 = '99¢' (mult=100)
      	  1 - high = [dollar32.0]
      	  ;
      run;

      and places ¢ as a suffix rather than prefix.

      Now we know how to do it both ways ☺.

  3. Jerry Voight on

    If you're running out of space in your SASWORK directory and have other directories or mount points to put work files and don't want to ask you System Administrator for help then try combining multiple work directories in your saswork option.

    1. Create a text file that contains the directories or mount points where you would like to direct SASWORK locations. (called /sas/workconfig)

    2. Your text file should look something like this. Choose a method for SAS to pick which directory it should use first. (Random or Space)
    /tmp1
    /tmp2
    /var/tmp
    /sas/saswork
    method=space

    3. In your sasv9_usermods.cfg file refer work to the text file created above.
    -work /sas/workconfig

    P.S. A tomato is a fruit so don't even feel bad for combining it with apples and oranges. ☺

    • Leonid Batkhan

      Thank you Jerry, for your contribution to the "combine and conquer" cause. Combining system resources is like a teamwork - increases capacity and reduces stress on its "components". As for tomato being a fruit, I would combine it with apples and oranges in a heartbeat even it were not ☺.

    • Leonid Batkhan

      Thank you, Matt. I did not think of it as "agile", but you are correct - combining data libraries and format libraries allow us to automate incremental SAS applications development, which is the definition of agile software development.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top