The inverse CDF method for simulating from a distribution

20

There are many techniques for generating random variates from a specified probability distribution such as the normal, exponential, or gamma distribution. However, one technique stands out because of its generality and simplicity: the inverse CDF sampling technique. If you know the cumulative distribution function (CDF) of a probability distribution, then you can always generate a random sample from that distribution. The inverse CDF technique for generating a random sample uses the fact that a continuous CDF, F, is a one-to-one mapping of the domain of the CDF into the interval (0,1). Therefore, if U is a uniform random variable on (0,1), then X = F–1(U) has the distribution F.

This article is taken from Chapter 7 of my book Simulating Data with SAS.

To illustrate the inverse CDF sampling technique (also called the inverse transformation algorithm), consider sampling from a standard exponential distribution. The exponential distribution has probability density f(x) = e–x, x ≥ 0, and therefore the cumulative distribution is the integral of the density: F(x) = 1 – e–x. This function can be explicitly inverted by solving for x in the equation F(x) = u. The inverse CDF is x = –log(1–u).

The following DATA step generates random values from the exponential distribution by generating random uniform values from U(0,1) and applying the inverse CDF of the exponential distribution. (Of course, the simpler way is to use x = RAND("Expo")!) The UNIVARIATE procedure is used to check that the data follow an exponential distribution. This example comes from Ross (2006, Fourth Edition).

/* Example of using the inverse CDF algorithm to generate variates
   from the exponential distribution */
data Exp(keep=x);
call streaminit(1234);
do i = 1 to 1000;                      /* sample size = 1000 */
   u = rand("Uniform");                /* statistically equivalent to */
   x = -log(1-u);                      /*     x = rand("Expo");       */
   output;
end;
run;
 
ods select histogram;
proc univariate data=Exp;
   histogram x / exponential(sigma=1) endpoints=0 to 10 by 0.5;
run;

In SAS, the QUANTILE function implements the inverse CDF function. That means that you can use the QUANTILE function to generate random variates. For example, the following statement is an equivalent way to use the inverse CDF method to generate exponential random variates:

   u = rand("Uniform"); 
   x = quantile("Expo", u);

Although powerful, this inverse CDF method can be computationally expensive unless you have a formula for the inverse CDF. In SAS the QUANTILE function implements the inverse CDF function, but for many distributions it has to numerically solve for the root of the equation F(x) = u.

The inverse CDF technique is particularly useful when you want to generate data from a truncated distribution. For a distribution F, if you generate uniform random variates on the interval [F(a), F(b)] and then apply the inverse CDF, the resulting values follow the F distribution truncated to [a, b]. For example, to simulate a variate from the truncated normal distribution on [–1.5, 2], use the following statements:

/* Inverse CDF algorithm for truncated normal distribution on [a,b] */
data TruncNormal(keep=x);
Fa = cdf("Normal", -1.5);              /* for a = -1.5 */
Fb = cdf("Normal",  2.0);              /* for b =  2.0 */
call streaminit(1234);
do i = 1 to 1000;                      /* sample size = 1000 */   
   v = Fa + (Fb-Fa)*rand("Uniform");   /* V ~ U(F(a), F(b))  */
   x = quantile("Normal", v);          /* truncated normal on [a,b] */
   output;
end;
run;
 
ods select histogram;
proc univariate data=TruncNormal;
   histogram x / endpoints=-1.5 to 2.0 by 0.25;
run;
Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

20 Comments

  1. Numerical Solution for the inverse transform method
    This question is Not Answered.(Mark as assumed answered)
    Dede Atem
    Novice
    Dede Atem Dec 30, 2013 8:53 AM
    Consider the following .
    C= know value
    U is Uniform (0,1)
    F(x)=1-S(x) is well known
    a=known value
    k=known value
    ht-= is known.
    The only unknown is X.

    I wish to write a SAS code that find X such that the right hand sight is equal the left hand side numerically.
    I can not make X the subject but can find a numerical solution.

    C*U=F(x) *exp(-(k-a*X)**2 - (k - a**2 * X) exp(-(k -a*X)**2/2t)*F(x)- ht
    16 Views Tags: none (add)

    • Rick Wicklin

      As stated on Wikipedia, a Nakagami random variable is just the square root of a gamma random variable. The CDF is given explicitly in terms of the incomplete gamma function, so use the CDF('GAMMA',...) function in SAS for the CDF and the QUANTILE('GAMMA',...) for the icdf.

  2. Indranil Ghosh on

    Hi Rick,

    How can one use the inverse cdf method to generate random samples from an unknown probability distribution, whose cdf is not invertible? For example, I have valid one dimensional density which has the following cdf:
    F(x)=(exp(theta*(1-exp(-(alpha*x)^(beta))))-1)*[1+ lambda-lambda*[(exp(theta*(1-exp(-(alpha*x)^(beta))))-1)]/((exp(theta)-1))].

    Let me know how to tackle this one.
    Thanks,
    Indranik

  3. Pingback: The Lambert W function in SAS - The DO Loop

  4. hi Rick, thanks for sharing.
    Suppose I have propensity score for a bunch of patients, and i have the ECDF of the PScore. if I want to draw a group of patients with similar ECDF from control patients, how can i sample based on a continuous CDF?

    • Rick Wicklin

      The generalized normal is defined in terms of the incomplete gamma function, which is a scaled version of the gamma distribution. Therefore you can invert the generalized normal CDF by using the quantile function of the gamma distribution. I suggest you do the inversion twice: once for y greater than mu and again for y less than mu. That eliminates the absolute value and the SIGN function.

  5. hi rick, how would one use the integral transform method, without numerical inversion of for example the Gamma distribution?

    • Rick Wicklin

      Sorry, but I do not understand your question. I suggest you post your question at the SAS Support Communities. If the information in this article is relevant, link to it in your question. In your question, you should explain what you mean by "the integral transform method, without numerical inversion."

  6. Suppose you are tasked with simulating a process where the inter-arrival times are not exponentially distributed, but Gamma(2, λ) under the fixed-count scheme, say 25 events, subject to the constraint that you must use the integral transform method of the Gamma distribution.

    Im working using R.

    The code below is what i used for an exponential distribution:
    1 R> n = 25
    2 R> lambda = 10
    3 R> u = runif(n,0,1)
    4 R> t = -1/lambda*log(1-u)

    How would you modify it for a gamma distribution simulation.

    Thanks

  7. Pingback: Tips to simulate binary and categorical variables - The DO Loop

Leave A Reply

Back to Top