In several previous articles, I've shown how to use SAS to fit models to data by using maximum likelihood estimation (MLE). However, I have not previously shown how to obtain standard errors for the estimates. This article combines two previous articles to show how to obtain MLE estimates and the

# Author

A previous article shows how to use Monte Carlo simulation to approximate the sampling distribution of the sample mean and sample median. When x ~ N(0,1) are normal data, the sample mean is also normal, and there are simple formulas for the expected value and the standard error of the

An elementary course in statistics often includes a discussion of the sampling distribution of a statistic. The canonical example is the sampling distribution of the sample mean. For samples of size n that are drawn from a normally distribution (X ~ N(μ, σ)), the sample mean is normally distributed as

A previous article discusses the birthday problem and its generalizations. The classic birthday problem asks, "In a room that contains N people, what is the probability that two or more people share a birthday?" The probability is much higher than you might think. For example, in a room that contains

The birthday-matching problem (also called the birthday paradox or simply the birthday problem), is a classic problem in probability. Simply stated, the birthday-matching problem asks, "If there are N people in a room, what is the chance that two of them have the same birthday?" The problem is sometimes called

Recently I wrote about numerical analysis problem: the accurate computation of log(1+x) when x is close to 0. A naive computation of log(1+x) loses accuracy if you call the LOG function, which is why the SAS language provides the built-in LOG1PX for this computation. In addition, I showed that you

SAS supports a special function for the accurate evaluation of log(1+x) when x is near 0. The LOG1PX function is useful because a naive computation of log(1+x) loses accuracy when x is near 0. This article demonstrates two general approximation techniques that are often used in numerical analysis: the Taylor

The documentation for Python's SciPy package provides a table that concisely summarizes functions that are associated with continuous probability distributions. This article provides a similar table for SAS functions. For more information on the CDF, PDF, quantile, and random-variate functions, see "Four essential functions for statistical programmers." SAS functions for

A previous article shows ways to perform efficient BY-group processing in the SAS IML language. BY-group processing is a SAS-ism for what other languages call group processing or subgroup processing. The main idea is that the data set contains several discrete variables such as sex, race, education level, and so

One thing I have learned about rank-based statistics over the years is "Be careful of tied values!" On multiple occasions, I have been asked, "Why doesn't the SAS result for [NAME] statistic agree with my hand calculation?" The answer is sometimes because of the way that tied values are handled.

Many useful matrices in applied math and statistics have a banded structure. Examples include diagonal matrices, tridiagonal matrices, banded matrices, and Toeplitz matrices. An example of an unsymmetric Toeplitz matrix is shown to the right. Notice that the matrix is constant along each diagonal, including sub- and superdiagonals. Recently, I

The other day I was trying to numerically integrate the function f(x) = sin(x)/x on the domain [0,∞). The graph of this function is shown to the right. In SAS, you can use the QUAD subroutine in SAS IML software to perform numerical integration. Some numerical integrators have difficulty computing

Did you know that you can embed one graph inside another by using PROC SGPLOT in SAS? A typical example is shown to the right. The large graph shows kernel density estimates for the distribution of the Cholesterol variable among male and female patients in a heart study. The small

I don't often use the SG annotation facility in SAS for adding annotations to statistical graphics, but when I do, I enjoy the convenience of the SG annotation macros. I can never remember the details of the SG annotation commands, but I know that the SG annotation macros will create

Many SAS procedures support a BY statement that enables you to perform an analysis for each unique value of a BY-group variable. The SAS IML language does not support a BY statement, but you can program a loop that iterates over all BY groups. You can emulate BY-group processing by

There are many ways to model a set of raw data by using a continuous probability distribution. It can be challenging, however, to choose the distribution that best models the data. Are the data normal? Lognormal? Is there a theoretical reason to prefer one distribution over another? The SAS has

Does anyone write paper checks anymore? According to researchers at the Federal Reserve Bank of Atlanta (Greene, et al., 2020), the use of paper checks has declined 63% among US consumers since the year 2000. The researchers surveyed more than 3,000 consumers in 2017-2018 and discovered that only 7% of

I have previously written about how to efficiently generate points uniformly at random inside a sphere (often called a ball by mathematicians). The method uses a mathematical fact from multivariate statistics: If X is drawn from the uncorrelated multivariate normal distribution in dimensiond, then S = r*X / ||X|| has

A previous article shows how to use the MODELAVERAGE statement in PROC GLMSELECT in SAS to perform a basic bootstrap analysis of the regression coefficients and fit statistics. A colleague asked whether PROC GLMSELECT can construct bootstrap confidence intervals for the predicted mean in a regression model, as described in

I've written many articles about bootstrapping in SAS, including several about bootstrapping in regression models. Many of the articles use a very general bootstrap method that can bootstrap almost any statistic that SAS can compute. The method uses PROC SURVEYSELECT to generate B bootstrap samples from the data, uses the

It has been more than a decade since SAS 9.3 changed the default ODS destination from the old LISTING destination to more modern destinations such as HTML. One of the advantages of modern output destinations is support for Unicode symbols, superscripts, subscripts, and for formatting text by using boldface, italics,

In ordinary least squares regression, there is an explicit formula for the confidence limit of the predicted mean. That is, for any observed value of the explanatory variables, you can create a 95% confidence interval (CI) for the predicted response. This formula assumes that the model is correctly specified and

A SAS programmer wanted to use PROC SGPLOT in SAS to visualize a regression model. The programmer wanted to visualize confidence limits for the predicted mean at certain values of the explanatory variable. This article shows two options for adding confidence limits to a scatter plot. You can use a

The acceptance-rejection method (sometimes called rejection sampling) is a method that enables you to generate a random sample from an arbitrary distribution by using only the probability density function (PDF). This is in contrast to the inverse CDF method, which uses the cumulative distribution function (CDF) to generate a random

There are dozens of common probability distributions for a continuous univariate random variable. Familiar examples include the normal, exponential, uniform, gamma, and beta distributions. Where did these distributions come from? Well, some mathematician needed a model for a stochastic process and wrote down the equation for the distribution, typically by

Let X be any rectangular matrix. What is the trace of the crossproducts matrix, X'*X? Interestingly, you do not need to form the crossproducts matrix to compute the answer! It turns out that tr(X'*X) equals the sum of the squared elements of X. Theorem: For any matrix, X, the trace

In a previous article, I discussed the Wilcoxon signed rank test, which is a nonparametric test for the location of the median. The Wikipedia article about the signed rank test mentions a variation of the test due to Pratt (1959). Whereas the standard Wilcoxon test excludes values that equal μ0

Wilcoxon's signed rank test is a popular nonparametric alternative to a paired t test. In a paired t test, you analyze measurements for subjects before and after some treatment or intervention. You analyze the difference in the measurements for each subject, and test whether the mean difference is significantly different

A previous article discusses standardized coefficients in linear regression models and shows how to compute standardized regression coefficients in SAS by using the STB option on the MODEL statement in PROC REG. It also discusses how to interpret a standardized regression coefficient. Recently, a SAS user wanted to know how

A previous article shows an example of a Markov chain model and computes the probability that the system ends up in a terminal state (called an absorbing state). As explained previously, you can often compute exact probabilities for questions about Markov chains. Nevertheless, it can be useful to know how