# How to generate random numbers in SAS

In SAS, you can generate a set of random numbers that are uniformly distributed by using the RAND function in the DATA step or by using the RANDGEN subroutine in SAS/IML software. (These same functions also generate samples from other common distributions such as binomial and normal.) The syntax is simple. The following DATA step creates a data set that contains 10 random uniform numbers in the range [0,1]:

```data A; call streaminit(123); /* set random number seed */ do i = 1 to 10; u = rand("Uniform"); /* u ~ U[0,1] */ output; end; run;```

The syntax for the SAS/IML program is similar, except that you can avoid the loop (vectorize) by allocating a vector and then filling all elements by using a single call to RANDGEN:

```proc iml; call randseed(123); /* set random number seed */ u = j(10,1); /* allocate */ call randgen(u, "Uniform"); /* u ~ U[0,1] */```

### Random uniform on the interval [a,b]

If you want generate random numbers on the interval [a,b], you have to scale and translate the values that are produced by RAND and RANDGEN. The width of the interval [a,b] is b-a, so the following statements produce random values in the interval [a,b]:

``` a = -1; b = 1; /* example values */ x = a + (b-a)*u;```

The same expression is valid in the DATA step and the SAS/IML language.

### Random integers

You can use the FLOOR or CEIL functions to transform (continuous) random values into (discrete) random integers. In statistical programming, it is common to generate random integers in the range 1 to Max for some value of Max, because you can use those values as observation numbers (indices) to sample from data. The following statements generate random integers in the range 1 to 10:

``` Max = 10; k = ceil( Max*u ); /* uniform integer in 1..Max */```

If you want random integers between 0 and Max or between Min and Max, the FLOOR function is more convenient:

``` Min = 5; n = floor( (1+Max)*u ); /* uniform integer in 0..Max */ m = min + floor( (1+Max-Min)*u ); /* uniform integer in Min..Max */```

Again, the same expressions are valid in the DATA step and the SAS/IML language.

### Putting it all together

The following DATA step demonstrates all the ideas in this blog post and generates 1,000 random uniform values with various properties:

```%let NObs = 1000; data Unif(keep=u x k n m); call streaminit(123); a = -1; b = 1; Min = 5; Max = 10; do i = 1 to &NObs; u = rand("Uniform"); /* U[0,1] */ x = a + (b-a)*u; /* U[a,b] */ k = ceil( Max*u ); /* uniform integer in 1..Max */ n = floor( (1+Max)*u ); /* uniform integer in 0..Max */ m = min + floor((1+Max-Min)*u); /* uniform integer in Min..Max */ output; end; run;```

You can use the UNIVARIATE and FREQ procedures to see how closely the statistics of the sample match the characteristics of the populations. The PROC UNIVARIATE output is not shown, but the histograms show that the sample data for the u and x variables are, indeed, uniformly distributed on [0,1] and [-1,1], respectively. The PROC FREQ output shows that the k, n, and m variables contain integers that are uniformly distributed within their respective ranges. Only the output for the m variable is shown.

```proc univariate data=Unif; var u x; histogram u/ endpoints=0 to 1 by 0.05; histogram x/ endpoints=-1 to 1 by 0.1; run;   proc freq data=Unif; tables k n m / chisq; run;```

1. MS
Posted August 24, 2011 at 7:39 am | Permalink

I wonder what kind of algorithm does SAS use when generating random numbers?

2. Wei Chen
Posted August 24, 2011 at 10:22 am | Permalink

I use RANUNI. Is there a differnece?

• Posted August 24, 2011 at 10:28 am | Permalink

Yes, they are different. RANUNI, RANNOR, etc., are functions that use an older random number generator. Their statistical properties are not as good as the newer RAND function. ("Newer" means it's only been in SAS since the mid-1990s!) For small data sets and simple demo examples, it doesn’t matter which function you use. However, if you are doing serious Monte Carlo simulations and generating millions of random numbers, then the better statistical properties of the RAND function become important.

• Tom
Posted August 30, 2011 at 7:01 am | Permalink

Hi Rick,

I am using VNORMAL for Monte Carlo simulations at the moment. Do you know which random number generator this function is using? I can't find anything in the SAS manual. Would it be better to use RANDNORMAL? I know RANDNORMAL allows you to use RANDSEED and VNORMAL doesn't but I didn't deem this very important.

3. MM
Posted February 7, 2012 at 11:01 am | Permalink

Hi Rick,

Thank you for the interesting and very helpful writings on SAS random numbers.

I ran 36 instances of a SAS program in parallel on a cluster. I provided unique seed to each running instance. Every instance generated 4,000,000 (four million) random numbers using RANUNI. Total of 144,000,000 (=36 * 4 mln.) random numbers for all instances were needed. After all instances have completed, I noticed that about 2.8% of the random numbers (generated in all instances) were duplicated, even though unique seeds were used by the instances.

When I used STREAMINIT and then RAND("UNIFORM") to generate the random numbers, about 4% of the random numbers (generated in all instances) were duplicated.

• Posted February 7, 2012 at 11:38 am | Permalink

If you haven't yet read my post Random Number Streams in SAS: How do they work?, be sure to read it.

In general, you shouldn't confuse INDEPENDENCE with UNIQUENESS. Random number generators try to achieve independence. There is nothing intrinsically wrong with getting a repeated value, just like there is nothing wrong with rolling a die and getting the same value multiple times. It happens often, and it doesn't mean that the die is unfair.

I caution against using RANUNI for large samples. RANUNI only provides 2 billion possible values. If you generate 144m obs in RANUNI, you shouldn't be surprised to get a repeated value. This is the famous Birthday Matching Problem, which I blogged about in the form of matching initials at a meeting.

I am curious: how are you determining that values are duplicated. PROC FREQ? PROC SORT with the NODUP option?

• MM
Posted February 8, 2012 at 9:44 am | Permalink

Hi Rick,

Please note, that when I use only one seed and generate 144m random numbers, I do not see any duplications.

Here is how I determine if that values are duplicated or not:
1. Assuming that every generated random number is placed (printed out) on a separate line of a file, for instance rand_nums.lst.
2. cat (Linux) command counts the number of all lines in the file. For instance:
cat rand_nums.lst | wc -l
3. sort (Linux) command counts the number of unique lines, For instance:
sort -nu rand_nums.lst | wc -l
4. If these number are the same then this means that the generated random numbers are unique.

• Posted February 8, 2012 at 10:23 am | Permalink

When you say there are no duplicates when you use one seed, is this for RANUNI, RAND, or both?

• MM
Posted February 8, 2012 at 1:30 pm | Permalink

So far I have tried only RANUNI.

4. Rachel
Posted April 9, 2012 at 4:21 pm | Permalink

How could I randomly generate a uniformly distributed variable, for example RAN, which always falls between 0 and 2?

• Posted April 10, 2012 at 6:36 am | Permalink

See the section, "Random uniform on the interval [a,b]" at the top of the page. For you, a=0 and b=2 so
ran = 2*u;

5. Stefan Boldsen
Posted June 21, 2012 at 6:55 am | Permalink

Hi Rick Wicklin,

I am a Danish master student. I am currently struggling with a simulation for my master thesis. The purpose of the simulation is to verify wether industrial merger waves exist in Europe or not.

I need to randomly generate x uniformly distributed numbers ('pseudo'-M&A's) between 1 and 120 (JanYear1, FebYear1...DecYear10) for every identified M&A-active industry (48 industries). And I need to repeat this step 1000 times. x is the observed number of M&A's in the industry under investigation.
- based on this blog post, I now think I know how to conduct the simulation in SAS.

My hurdle is that after the simulation process, I need to identify the volume of the highest 24-month concentration for each of the 1000 draws. Can you help me here, Rick? I need the 24-month concentrations to conclude whether an industrial merger wave exists or not for a given industry --> if in 99% of the draws the highest 24-month concentration is lower than the actually or observed peak concentration, there is significant evidence for the existence of an 2 year merger wave within the given industry, in that decade.

I really, really hope that you can help me.

6. Jerald Nathan
Posted September 4, 2012 at 4:33 am | Permalink

Hi Rick,

the above code is not generating unique random numbers if set number of observations=400000 and min=10000000 and max=99999999. Basically I need to generate unique random number with 8 digit.

any alternatives?

• Posted September 4, 2012 at 5:59 am | Permalink

Random numbers are not necessarily unique. Consider rolling a six-sided die two times. About 1/6 of the time the random number 1-6 will be repeated! To get uniqueness you want to "sample without replacement" from the list of numbers that you want. You can use the METHOD=SRS method in PROC SURVEYSELECT to select samples without replacement. In PROC IML, you can use the SAMPLE function.

• Mickey Mancenido
Posted September 23, 2012 at 1:59 am | Permalink

Hi Rick! Is it possible to use the RAND() function inside PROC IML? I tried doing that and it seems to work; I'm just concerned if it produces the same result as the RANDGEN subroutine. I've been reading some comments that inside PROC IML, the RANDGEN subroutine should be used. However, I don't need to generate one stream of random numbers every iteration. I need to generate just one random number, and the parameter of the distribution (say the binomial sample size) varies from iteration to iteration, so you can see my dilemma about using RANDGEN.

• Posted September 23, 2012 at 2:17 pm | Permalink

I do not see your dilemma about using RANDGEN. You can geneate 1 sample as efficiently with RANDGEN as with RAND. However, to answer your question: yes, you can call RAND from PROC IML. Furthermore, you can pass a vector of parameters to RAND and get out a vector of binomial sample sizes.

7. safa
Posted January 6, 2013 at 4:30 am | Permalink

Hi..,
How to generate 5 sample with sample size is seven by using SAS?

• Posted January 6, 2013 at 9:11 pm | Permalink

From what distribution? Uniform? Discrete uniform? Normal? I've written more than 30 articles on simultion, so you can find lots of examples by clicking on the "Simulation and Sampling" link in the right-hand sidebar. In particular, look at the DATA step in the second set of code in this article: http://blogs.sas.com/content/iml/2012/07/18/simulation-in-sas-the-slow-way-or-the-by-way/ It shows a DATA step with two nested loops. Make the outer loop go to 5 and the inner loop go to 7.

8. Joshua Abel
Posted February 14, 2013 at 1:59 pm | Permalink

Hello,

I was wondering if it's possible to do a similar exercise, but pulling a VECTOR of 2 bivariate normal variables? If I know the means and the variance-covariance matrix of my variables, can SAS randomly draw from the joint distribution?

Thanks!

Josh

9. Posted July 19, 2013 at 2:10 pm | Permalink

Can random number generation/simulation be used in Proc OPTMODEL ? i know it is supported in proc model to do the simulation, but could not find anything for Proc Optmodel.

10. jhen
Posted February 17, 2014 at 2:40 am | Permalink

How can randomly select when your data is from 2003- 2013 and we will select only for 2003-2012.what syntax do we need to use?

• Posted February 17, 2014 at 6:23 am | Permalink

Sounds like you want to subset the data by using a WHERE clause
WHERE YEAR>=2003 AND YEAR<=2012;
Then
m = 2003 + floor((1+2012-2003)*u); /* uniform integer in 2003..2012 */

11. Ben
Posted March 17, 2014 at 12:58 pm | Permalink

Hi Rick,

Sorry to revive an old thread, but I was wondering what your thoughts were (and why it wasn't mentioned) on using ROUND() around the a+(b-a)*u formula for random integers in [a,b]? I originally used FLOOR()/CEIL() in my code, but lately (especially when I have a small interval, such as [1,5]) I have switched to ROUND() since FLOOR()/CEIL() bias away from b/a, respectively. I know that traditional discrete uniform distribution says that random draws of each value in an interval of K values should tend towards a 1/K distribution, but I don't believe the FLOOR()/CEIL() functions provide this.

Thanks for all of the great knowledge that you share,
Ben

• Posted March 17, 2014 at 1:45 pm | Permalink

Maybe I am misunderstanding what you are proposing. I didn't put ROUND around the a+(b-a)*u formula because the resulting integers are not uniformly distributed. For example, if I want uniform integers in the range {1,2,3,4,5}, it is incorrect to write the following:

a = 1; b = 5;
u = j(10000,1); /* allocate */
call randgen(u, "Uniform"); /* u ~ U[0,1] */
p = round(a + (b-a)*u); /* NOT uniformly distributed! */
call tabulate(value, freq, p); /* compute empirical distribution */
print (freq/10000)[c=(char(value)) f=percent7.4];

The code shows that the chance of a 1 or 5 is only 12.5% each, whereas the chance of 2, 3, or 4 is 25% each.

12. stan
Posted May 13, 2014 at 4:28 pm | Permalink

Rick: is there a way to generate random numbers with a specified (with known (geo)mean and (geo)sd) lognormal distribution in SAS? Many thanks for the blog.

• Posted May 13, 2014 at 4:50 pm | Permalink

If I understand you, use the RAND("Normal", mu, sigma) function to generate X ~ N(mu, sigma). The variable Y = exp(X) is lognormally distributed with parameters mu and sigma.

• stan
Posted May 14, 2014 at 3:56 pm | Permalink

Implementing the formulas for mu and sigma from the "Notation" section of the "Log-normal distribution" entry on the Wikipedia (http://en.wikipedia.org/wiki/Log-normal_distribution)
and with your suggestion the following syntax:

``````
DATA qwerty;
DO a = 1 TO 1000000;
mean = 81.2243980;
sd   = 15.6962440;
mu    = (LOG((mean**2)/(SQRT((sd**2)+(mean**2)))));
sigma = (SQRT(LOG(1+((sd**2)/(mean**2)))));
nr    = RAND('NORMAL', mu, sigma);
lgnrm = CONSTANT('E')**nr;
OUTPUT; END; RUN;
PROC PRINT DATA = qwerty (OBS = 10); var lgnrm; run;
PROC UNIVARIATE DATA = qwerty;
HISTOGRAM lgnrm / LOGNORMAL (THETA = EST SIGMA = EST
ZETA = EST) ENDPOINTS = 0 TO 100 BY 1.0;
VAR lgnrm;
ODS SELECT Moments HISTOGRAM; RUN;
``````

produced the data set with the MEAN = 81.2260929 and SD = 15.6965921
what is very close to the magnitudes wanted :) !

Please correct me if I have a mistake.

Thank you again for the blog --- very informative and practically useful.

P.S.
Initially I meant what I was suggested
(http://stackoverflow.com/a/23635776/1009306)
and unexpectedly what was written about by yourself
(http://blogs.sas.com/content/iml/2013/07/22/the-inverse-cdf-method/)

Does the iCDF approach give the same results? What are the benefits to use it?

• Posted May 14, 2014 at 4:21 pm | Permalink

Looks good, although I'd use lgnrm = exp(nr) in the DATA step and set THETA=0 in the HISTOGRAM stmt.
If you have more questions, please post to the SAS Support Communities. There are about 20 subcommunities there, such as SAS Statistical Procedures.

The advantage of the iCDF method is that is always works. However, it tends to be slower than direct transformation methods, such as used here.

• stan
Posted May 15, 2014 at 12:59 pm | Permalink

Thank you for your suggestions and quick replies :).

13. Posted October 16, 2014 at 5:19 am | Permalink

Thanks. I used this today for a demonstration I was working on of uniform and normal distributions.

14. sepideh
Posted January 12, 2015 at 2:58 pm | Permalink

Code to generate random numbers between 0 and 1 continuous distribution c programming language

15. hira ballabh
Posted March 25, 2015 at 4:10 pm | Permalink

Hi
I have a dataset for a one district in this district 16 Mandal and each Mandal have three type (Govt., Private, NGO,) 3500 school, I want a sample for each Mandal 1 got 1 private 1 NGO School total Number of sample are 48, whenever we run the programme the sample should be different, not same , could you help me how can I take a sample using SAS

1. [...] previously showed how to generate random numbers in SAS by using the RAND function in the DATA step or by using the RANDGEN subroutine in SAS/IML software. [...]

2. [...] How to generate random numbers in SAS? All about using the RAND function to generate random values. This article inspired the follow-up article, "Random number streams in SAS: How do they work?" [...]

3. [...] output will also be approximately 85% of the original data, but not exactly. It's random, so the result varies each time that I run it. What if I need to know exactly how many records were [...]

4. [...] in SAS, I have been doing a lot of reading and research about how to simulate various quantities. Random integers? Check! Random univariate samples? Check! Random multivariate samples? [...]

5. [...] functions for statistical programmers. Simulation depends on random samples, so it is good to know how to generate random numbers in SAS. Lastly, it is important to understand random number streams in SAS and how they work. MATRIX [...]

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, statistical graphics, statistical simulation, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.