Random number seeds: Only the first seed matters!

6

The other day I encountered the following SAS DATA step for generating three normally distributed variables. Study it, and see if you can discover what is unnecessary (and misleading!) about this program:

data points;
drop i;
do i=1 to 10;
   x=rannor(34343);
   y=rannor(12345);
   z=rannor(54321);
   output;
end;
run;

The program creates the POINTS data set. The data set contains three variables, each containing random numbers from the standard normal distribution. I'm guessing that the author of the program thinks that using rannor(12345) to define the y variable makes y independent from the x variable, which is defined by rannor(34343).

Sorry, but that is not correct.

The x, y, and z variables are, indeed, independent samples from a normal distribution, but that fact does not depend on using different seeds in the RANNOR function. In fact, in this DATA step, all random number seeds except the first one are completely ignored! Don't believe me? Run the following DATA step and compare the two data sets, as follows:

data points2;
drop i;
/* change all random number seeds except the first */
x=rannor(34343); y=rannor(1); z=rannor(2); output;
do i=2 to 10;
   x=rannor(10+i);
   y=rannor(100+i);
   z=rannor(1000+i);
   output;
end;
run;
 
proc compare base=points compare=points2;
run;
                           The COMPARE Procedure
                Comparison of WORK.POINTS with WORK.POINTS2
                               (Method=EXACT)
                               
NOTE: No unequal values were found. All values compared are exactly equal.

All values compared are exactly equal. Every observation, every variable, down to the last bit. But except for the first observation of the x variable, the second DATA step uses completely different random number seeds! How can the POINTS2 data set be identical to the POINTS data set?

As I explained in a previous post on random number seeds in SAS, the random number seed for a DATA step (or SAS/IML program) is set by the first call. SAS ignores subsequent seeds within the same DATA step or PROC step. In my previous post, I used the newer (and better) STREAMINIT function and the RAND function instead of the older RANNOR function, but the fact remains that first random number seed determines the random number stream for the entire DATA step. That is, only the first call to the STREAMINIT subroutine is important, as shown in the following example:

data points3;
drop i;
do i=1 to 10;
   call streaminit(123);    /* this call is used to set the seed */
   x = rand("Normal");
   call streaminit(54321);  /* this call is ignored */
   y = rand("uniform");
   output;
end;
run;

The program looks like it is using different streams for the normal and uniform variates, but it is not. The first call to the STREAMINIT call is sets the seed; future calls in the same DATA step are ignored. Thus, it would be better to move call streaminit(123) to the top of the program, outside of the loop. Moving the STREAMINIT call to the top of the program will generate the same set of pseudorandom numbers.

For further details, see the SAS documentation, which shows an example similar to mine in which three data sets (imaginatively named A, B, and C) contain the same pseudorandom numbers.

Now that I've ranted against using different random number seeds, I will reveal that the DATA step at the beginning of my post is from an example in the SAS Knowledge Base! Yes, even experienced SAS programmers are sometimes confused by the subtleties of random number streams. There is nothing wrong with a program that uses multiple seeds, but such a program makes the reader think that all those seeds are actually doing something. They’re not.

Are you someone who uses different random number seeds for each variable in the same DATA step or PROC IML program? If so, you can safely stop. Multiple seeds do not make your random variables any more "random." Only the first seed matters.
Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

6 Comments

  1. I seem to recall that separate independent seeds and hence "independent" random series of numbers could be called by using the "call" version of the random number functions. I had some code at one time that demonstrated that as true ... but I don't have it easily available right now.

  2. My codes also demonstrate that the first random number seed determines the random number stream in PROC IML using RANDGEN function.

    proc iml;
    ntot=j(20,10,.);
    fill=j(20,1,.);
    seed=1824;
    	do i=1 to 10;
    		call randseed(seed);
    		call randgen(fill,'normal',0,1);
    		ntot[,i]=fill;
    	end;
    print ntot;
    quit;
     
    proc iml;
    ntot1=j(20,10,.);
    fill=j(20,1,.);
    seed1=1822;
    	do i=1 to 10;
    		seed1=seed1+2*i;
    		call randseed(seed1);
    		call randgen(fill,'normal',0,1);
    		ntot1[,i]=fill;
    	end;
    print ntot1;
  3. Pingback: Six reasons you should stop using the RANUNI function to generate random numbers - The DO Loop

Leave A Reply

Back to Top