There are many techniques for generating random variates from a specified probability distribution such as the normal, exponential, or gamma distribution. However, one technique stands out because of its generality and simplicity: the inverse CDF sampling technique. If you know the cumulative distribution function (CDF) of a probability distribution, then you can always generate a random sample from that distribution. The inverse CDF technique for generating a random sample uses the fact that a continuous CDF, F, is a one-to-one mapping of the domain of the CDF into the interval (0,1). Therefore, if U is a uniform random variable on (0,1), then X = F–1(U) has the distribution F.
This article is taken from Chapter 7 of my book Simulating Data with SAS.
To illustrate the inverse CDF sampling technique (also called the inverse transformation algorithm), consider sampling from a standard exponential distribution. The exponential distribution has probability density f(x) = e–x, x ≥ 0, and therefore the cumulative distribution is the integral of the density: F(x) = 1 – e–x. This function can be explicitly inverted by solving for x in the equation F(x) = u. The inverse CDF is x = –log(1–u).
The following DATA step generates random values from the exponential distribution by generating random uniform values from U(0,1) and applying the inverse CDF of the exponential distribution. (Of course, the simpler way is to use x = RAND("Expo")!) The UNIVARIATE procedure is used to check that the data follow an exponential distribution. This example comes from Ross (2006, Fourth Edition).
/* Example of using the inverse CDF algorithm to generate variates from the exponential distribution */ data Exp(keep=x); call streaminit(1234); do i = 1 to 1000; /* sample size = 1000 */ u = rand("Uniform"); /* statistically equivalent to */ x = -log(1-u); /* x = rand("Expo"); */ output; end; run; ods select histogram; proc univariate data=Exp; histogram x / exponential(sigma=1) endpoints=0 to 10 by 0.5; run;
In SAS, the QUANTILE function implements the inverse CDF function. That means that you can use the QUANTILE function to generate random variates. For example, the following statement is an equivalent way to use the inverse CDF method to generate exponential random variates:
u = rand("Uniform"); x = quantile("Expo", u);
Although powerful, this inverse CDF method can be computationally expensive unless you have a formula for the inverse CDF. In SAS the QUANTILE function implements the inverse CDF function, but for many distributions it has to numerically solve for the root of the equation F(x) = u.
The inverse CDF technique is particularly useful when you want to generate data from a truncated distribution. For a distribution F, if you generate uniform random variates on the interval [F(a), F(b)] and then apply the inverse CDF, the resulting values follow the F distribution truncated to [a, b]. For example, to simulate a variate from the truncated normal distribution on [–1.5, 2], use the following statements:
/* Inverse CDF algorithm for truncated normal distribution on [a,b] */ data TruncNormal(keep=x); Fa = cdf("Normal", -1.5); /* for a = -1.5 */ Fb = cdf("Normal", 2.0); /* for b = 2.0 */ call streaminit(1234); do i = 1 to 1000; /* sample size = 1000 */ v = Fa + (Fb-Fa)*rand("Uniform"); /* V ~ U(F(a), F(b)) */ x = quantile("Normal", v); /* truncated normal on [a,b] */ output; end; run; ods select histogram; proc univariate data=TruncNormal; histogram x / endpoints=-1.5 to 2.0 by 0.25; run;
WANT MORE GREAT INSIGHTS MONTHLY? | SUBSCRIBE TO THE SAS TECH REPORT