I received the following query regarding the RAND function in Base SAS:
In SAS, is specifying 0 as a random number seed the same as not specifying a seed at all?
The question concerns initializing the SAS random number stream by using the internal system clock. You can do this explicitly by calling the STREAMINIT routine with 0 as an argument. However, if you call the RAND function and have never called the STREAMINIT routine, then the RAND function implicitly calls the STREAMINIT function with an argument of 0.
The answer, therefore, is "Yes, they produce the same result" in the sense that in both cases the internal clock time is used to initialize the random number stream.
However, if you want to learn a tiny technical difference, read on.
The two approaches are very slightly different with regard to WHEN the random number seed is set. Consider the following two DATA steps:
data A; /* random number stream initialized when the next stmt is executed */ call streaminit(0); /* do a computation that takes an hour ... */ x = rand("Uniform"); run; data B; /* do a computation that takes an hour ...*/ /* random number stream initialized when the next stmt is executed */ x = rand("Uniform"); run; |
In the first DATA step, the random number stream is initialized on the first line of the program. If the DATA step is run exactly at midnight, the random number stream is initialized a few milliseconds after midnight. In contrast, in the second DATA step, the random number stream is not initialized until the RAND function is executed. If the DATA step is run exactly at midnight, but it runs for an hour before the RAND function executes, then the random number stream is initialized around 1:00 am.
Since both random number streams are equally random, in practice it probably doesn't matter which scheme you use. I like to explicitly call the STREAMINIT routine because then it is obvious to everyone who reads the program that it will produce different numbers every time it runs.
1 Comment
This reminds me of something that I have often wondered about. Unlike the datasteps above, PROC OPTEX reports in the log the random seed that has been used. I assume this is also derived in the same way from the internal clock. The random seed reported is nearly always a 9 digit integer that ends in either 001 or 000 (I am using SAS on a Windows 7 machine). Should I worry about this non-randomness of the last 3 digits? I suspect the number of digits from the clock is limited and a multiplication takes place to put the seed on a larger scale, but it would be interesting to know the mechanics of this.