Simulate lognormal data with specified mean and variance

21
lognormalparams

In my book Simulating Data with SAS, I specify how to generate lognormal data with a shape and scale parameter. The method is simple: you use the RAND function to generate X ~ N(μ, σ), then compute Y = exp(X). The random variable Y is lognormally distributed with parameters μ and σ. This is the standard definition, but notice that the parameters are specified as the mean and standard deviation of X = log(Y).

Recently, a SAS customer asked me an interesting question. What if you know the mean and variance of Y, rather than log(Y)? Can you still simulate lognormal data from a distribution with that mean and variance?

Mathematically, the situation is that if m and v are the mean and variance, respectively, of a lognormally distributed variable Y, can you compute the usual parameters for log(Y)? The answer is yes. In terms of μ and σ, the mean of Y is m = exp(μ + σ2/2) and the variance is v = (exp(σ2) -1) exp(2μ + σ2). You can invert these formulas to get μ and σ as functions of m and v. Wikipedia includes these formulas in its article on the lognormal distribution, as follows:

lognormaleqns

Let's rewrite the expression inside the logarithm. If you let φ = sqrt(v + m2), then the formulas are more simply written as
μ = ln(m2 / φ),     σ2 = ln(φ2 / m2 )
Consequently, you can specify the mean and the variance of the lognormal distribution of Y and derive the corresponding (usual) parameters for the underlying normal distribution of log(Y), as follows:

data convert;
m = 80; v = 225;                /* mean and variance of Y */
phi = sqrt(v + m**2);
mu    = log(m**2/phi);          /* mean of log(Y)    */
sigma = sqrt(log(phi**2/m**2)); /* std dev of log(Y) */
run;
 
proc print noobs; run;
t_lognormalparams

For completeness, let's simulate data from a lognormal distribution with a mean of 80 and a variance of 225 (that is, a standard deviation of 15). The previous computation enables you to find the parameters for the underlying normal distribution (μ and σ) and then exponentiate the simulated data:

data lognormal;
call streaminit(1);
keep x y;
m = 80; v = 225;      /* specify mean and variance of Y */
phi = sqrt(v + m**2);
mu    = log(m**2/phi);
sigma = sqrt(log(phi**2/m**2));
do i = 1 to 100000;
   x = rand('Normal', mu, sigma);
   y = exp(x);
   output;
end;
run;

You can use the UNIVARIATE procedure to verify that the program was implemented correctly. The simulated data should have a sample mean that is close to 80 and a sample standard deviation that is close to 15. Furthermore, the LOGNORMAL option on the HISTOGRAM statement enables you to fit a lognormal distribution to the data. The fit should be good and the parameter estimates should be close to the parameter values μ = 4.36475 and σ = 0.18588 (except that PROC UNIVARIATE uses the Greek letter zeta instead of mu):

ods select Moments Histogram ParameterEstimates;
proc univariate data=lognormal;
   var y;
   histogram y / lognormal(zeta=EST sigma=EST);
run;

The histogram with fitted lognormal curve is shown at the top of this article. The mean of the simulated data is very close to 80 and the sample standard deviation is close to 15.

t_lognormalparams2

My thanks to the SAS customer who asked this question—and researched most of the solution! It is a question that I had not previously considered.

Is this a good way to simulate lognormal data? It depends. If you have data and you want to simulate lognormal data that "looks just like it," I suggest that you run PROC UNIVARIATE on the real data and produce the maximum likelihood parameter estimates for the lognormal parameters μ and σ. You can then use those MLE estimates to simulate more data. However, sometimes the original data is not available. You might have only summary statistics that appear in some journal or textbook. In that case the approach in this article enables you to map the descriptive statistics of the original data to the lognormal parameters μ and σ so that you can simulate the unavailable data.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

21 Comments

  1. Pingback: Geometry, sensitivity, and parameters of the lognormal distribution - The DO Loop

  2. Amany Hassan on

    Hi Rick,
    Is it possible to generate a matrix of random numbers using lognormal distribution with a specific mean and variance directly without using normal distribution as you did in the above program? How we can do that?
    I appreciate yoyr help.
    Best Regards, Amany

      • Amany Hassan on

        I want to generate a matrix 5*5 from lognormal dist. with a specified values of mean and sigma, thus I wrote the folloowing program:

        proc iml;
        mu=1;
        sigma=0.04996;/*sigma value*/
        n = 5; m = 5;
        x2 = j(n,m);
        call randseed (12345);
        x2 = j(n,m); /* allocate (n x m) matrix*/
        call randgen(x2,'LOGN', mu, sigma);
        print x2;
        quit;
        run;

        However, I got the following message in the log. I do not know what is wrong!!
        call randgen(x2,'LOGN', mu, sigma);
        ERROR: (execution) Incorrect number of arguments.
        I appreciate your help.
        Amany

  3. Amany Hassan on

    Hello Rick,
    I generated data from Weibull dist. (a,b) for 1000 times using do loop and the MLEs were calculated for each group of data as it is shown in SAS help.
    The run was done but I got the following warnings and notes:

    WARNING: Finishing a module while inside a DO group.
    NOTE: Module F_WEIB2 defined.
    WARNING: Finishing a module while inside a DO group.
    NOTE: Module G_WEIB2 defined.

    Does it mean that there is something wrong?
    Is it allowed to use the modules for calculating MLEs of Weibull dist. inside do loop?

    I appreciate your help,
    Best Regards.

  4. Dear Rick,
    thank you for this post.
    I am currently working on my master thesis in Epidemiology. I want to simulate data, which closely resembles my real-world example. For that I chose the approach you mentioned above.
    In your book "Simulating Data with SAS", the distribution table for the rand() function shows that it is possible to specify shape and scale parameters for lognormal and gamma distributions. The SAS help indicates that it is only possible to specify the shape and not scale. When entering both parameters into the rand(parm1, parm2) function, I receive an error: " One parameter must be specified with the RAND function and the gamma distribution.".
    Is there a way around this? I would like to stick with the rand() function as suggest generally by you, and not go back to the rangam()/rannor() routines.

    Thank you very much!

    With kind regards,
    Tim

    • Rick Wicklin

      It appears that some documentation did not get updated. Thank you for bringing that to my attention.

      If you do not have support for the two-parameter Gamma distribution, you must be running an old version of SAS. You should mention this to your system administrators and encourage them to upgrade to SAS 9.4.

      On p. 110, I show how to add location and scale parameters to distributions that do not support them. The Gamma and Lognormal families are included in that discussion.

    • Rick Wicklin

      zk does not have a value. It is a random variable, which means that it has a probability of becoming a value. For example, it has a 68% chance of being between the values exp(-8) and exp(8). Its most likely value is 1.

      • Hello,

        To complete the question from Tran Quang Thi, I wonder what are the parameters mu and sigma knowing that m = 0. In this case, using the formula for mu we have a log(0) which is undefined... ?

        • Rick Wicklin

          Your question doesn't make sense. The mean of a lognormally distributed variable, Y, is always greater than zero because Y = exp(X), where X is normally distributed. The range of the exponential function is Y > 0.

  5. Hi Rick

    Just wondered if below can be done in Excel?

    x = rand('Normal', mu, sigma)

    I am trying to generate a lognormal distribution for initial gas flow rate with known mean and SD and unknown data set and am not very familiar with other tools.

    Thanks,
    Ali

  6. Hello guys,

    I'm looking for someone who can give me a possible code for this exercice?
    Thanks in advance

    The following observations are data from four mutuals funds. The data is delimited by spaces between the variables:
    ...

  7. Pingback: Simulate the use of personal checks in the US - The DO Loop

Leave A Reply

Back to Top