In my book Simulating Data with SAS, I specify how to generate lognormal data with a shape and scale parameter. The method is simple: you use the RAND function to generate X ~ N(μ, σ), then compute Y = exp(X). The random variable Y is lognormally distributed with parameters μ and σ. This is the standard definition, but notice that the parameters are specified as the mean and standard deviation of X = log(Y).
Recently, a SAS customer asked me an interesting question. What if you know the mean and variance of Y, rather than log(Y)? Can you still simulate lognormal data from a distribution with that mean and variance?
Mathematically, the situation is that if m and v are the mean and variance, respectively, of a lognormally distributed variable Y, can you compute the usual parameters for log(Y)? The answer is yes. In terms of μ and σ, the mean of Y is m = exp(μ + σ2/2) and the variance is v = (exp(σ2) -1) exp(2μ + σ2). You can invert these formulas to get μ and σ as functions of m and v. Wikipedia includes these formulas in its article on the lognormal distribution, as follows:
Let's rewrite the expression inside the logarithm. If you let φ = sqrt(v + m2), then the formulas are more simply written as
μ = ln(m2 / φ), σ2 = ln(φ2 / m2 )
Consequently, you can specify the mean and the variance of the lognormal distribution of Y and derive the corresponding (usual) parameters for the underlying normal distribution of log(Y), as follows:
data convert; m = 80; v = 225; /* mean and variance of Y */ phi = sqrt(v + m**2); mu = log(m**2/phi); /* mean of log(Y) */ sigma = sqrt(log(phi**2/m**2)); /* std dev of log(Y) */ run; proc print noobs; run;
For completeness, let's simulate data from a lognormal distribution with a mean of 80 and a variance of 225 (that is, a standard deviation of 15). The previous computation enables you to find the parameters for the underlying normal distribution (μ and σ) and then exponentiate the simulated data:
data lognormal; call streaminit(1); keep x y; m = 80; v = 225; /* specify mean and variance of Y */ phi = sqrt(v + m**2); mu = log(m**2/phi); sigma = sqrt(log(phi**2/m**2)); do i = 1 to 100000; x = rand('Normal', mu, sigma); y = exp(x); output; end; run;
You can use the UNIVARIATE procedure to verify that the program was implemented correctly. The simulated data should have a sample mean that is close to 80 and a sample standard deviation that is close to 15. Furthermore, the LOGNORMAL option on the HISTOGRAM statement enables you to fit a lognormal distribution to the data. The fit should be good and the parameter estimates should be close to the parameter values μ = 4.36475 and σ = 0.18588 (except that PROC UNIVARIATE uses the Greek letter zeta instead of mu):
ods select Moments Histogram ParameterEstimates; proc univariate data=lognormal; var y; histogram y / lognormal(zeta=EST sigma=EST); run;
The histogram with fitted lognormal curve is shown at the top of this article. The mean of the simulated data is very close to 80 and the sample standard deviation is close to 15.
My thanks to the SAS customer who asked this question—and researched most of the solution! It is a question that I had not previously considered.
Is this a good way to simulate lognormal data? It depends. If you have data and you want to simulate lognormal data that "looks just like it," I suggest that you run PROC UNIVARIATE on the real data and produce the maximum likelihood parameter estimates for the lognormal parameters μ and σ. You can then use those MLE estimates to simulate more data. However, sometimes the original data is not available. You might have only summary statistics that appear in some journal or textbook. In that case the approach in this article enables you to map the descriptive statistics of the original data to the lognormal parameters μ and σ so that you can simulate the unavailable data.