When I was writing Simulating Data with SAS (Wicklin, 2013), I read a lot of introductory textbooks about Monte Carlo simulation. One of my favorites is Sheldon Ross's book Simulation. (I read the 4th Edition (2006); the 5th Edition was published in 2013.) I love that the book brings together

# Author

I've previously shown how to use Monte Carlo simulation to estimate probabilities and areas. I illustrated the Monte Carlo method by estimating π ≈ 3.14159... by generating points uniformly at random in a unit square and computing the proportion of those points that were inside the unit circle. The previous

It isn't easy to draw the graph of a function when you don't know what the graph looks like. To draw the graph by using a computer, you need to know the domain of the function for the graph: the minimum value (xMin) and the maximum value (xMax) for plotting

A colleague was struggling to compute a right-tail probability for a distribution. Recall that the cumulative distribution function (CDF) is defined as a left-tail probability. For a continuous random variable, X, with density function f, the CDF at the value x is F(x) = Pr(X ≤ x) = ∫

A SAS programmer wanted to create a panel that contained two of the graphs side-by-side. The graphs were created by using calls to two different SAS procedures. This article shows how to select the graphs and arrange them side-by-side by using the ODS LAYOUT GRIDDED statement. The end of the

I previously wrote about partial leverage plots for regression diagnostics and why they are useful. You can generate a partial leverage plot in SAS by using the PLOTS=PARTIALPLOT option in PROC REG. One useful property of partial leverage plots is the ability to graphically represent the null hypothesis that a

Many people know that you can use "WHERE processing" in SAS to filter observations. A typical use is to process only observations that match some criterion. For example, the following WHERE statement processes only observations for male patients who have high blood pressure: WHERE Sex='Male' & Systolic > 140; In

A previous article shows how to compute the probability density function (PDF) for the multivariate normal distribution. In a similar way, you can compute the density function for the multivariate t distribution. This article discusses the density function for the multivariate t distribution, shows how to compute it, and visualizes

Recently, I needed to solve an optimization problem in which the objective function included a term that involved the quantile function (inverse CDF) of the t distribution, which is shown to the right for DF=5 degrees of freedom. I casually remarked to my colleague that the optimizer would have to

For a linear regression model, a useful but underutilized diagnostic tool is the partial regression leverage plot. Also called the partial regression plot, this plot visualizes the parameter estimates table for the regression. For each effect in the model, you can visualize the following statistics: The estimate for each regression

The ODS GRAPHICS statement in SAS supports more than 30 options that enable you to configure the attributes of graphs that you create in SAS. Did you know that you can display the current set of graphical options? Furthermore, did you know that you can temporarily set certain options and

A palindrome is a sequence of letters that is the same when read forward and backward. In brief, if you reverse the sequence of letters, the word is unchanged. For example, 'mom' and 'racecar' are palindromes. You can extend the definition to phrases by removing all spaces and punctuation marks

M estimation is a robust regression technique that assigns a weight to each observation based on the magnitude of the residual for that observation. Large residuals are downweighted (assigned weights less than 1) whereas observations with small residuals are given weights close to 1. By iterating the reweighting and fitting

An early method for robust regression was iteratively reweighted least-squares regression (Huber, 1964). This is an iterative procedure in which each observation is assigned a weight. Initially, all weights are 1. The method fits a least-squares model to the weighted data and uses the size of the residuals to determine

A common question on SAS discussion forums is how to randomly assign observations to groups. An application of this problem is assigning patients to cohorts in a clinical trial. For example, you might have 137 patients that you want to randomly assign to three groups: a control group, a group

Many modern statistical techniques incorporate randomness: simulation, bootstrapping, random forests, and so forth. To use the technique, you need to specify a seed value, which determines pseudorandom numbers that are used in the algorithm. Consequently, the seed value also determines the results of the algorithm. In theory, if you know

I have previously blogged about ways to perform balanced bootstrap resampling in SAS. I recently learned about an easier way: Since SAS/STAT 14.2 (SAS 9.4M4), the SURVEYSELECT procedure has supported balanced bootstrap sampling. This article reviews balanced bootstrap sampling and shows how to use the METHOD=BALBOOT option in PROC SURVEYSELECT

In categorical data analysis, it is common to analyze tables of counts. For example, a researcher might gather data for 18 boys and 12 girls who apply for a summer enrichment program. The researcher might be interested in whether the proportion of boys that are admitted is different from the

Did you know that there is a mathematical formula that simplifies finding the derivative of a determinant? You can compute the derivative of a determinant of an n x n matrix by using the sum of n other determinants. The n determinants are for matrices that are equal to the original matrix

In The Essential Guide to Bootstrapping in SAS, I note that there are many SAS procedures that support bootstrap estimates without requiring the analyst to write a program. I have previously written about using bootstrap options in the TTEST procedure. This article discusses the NLIN procedure, which can fit nonlinear

Recently, I wrote about Bartlett's test for sphericity. The purpose of this hypothesis test is to determine whether the variables in the data are uncorrelated. It works by testing whether the sample correlation matrix is close to the identity matrix. Often statistics textbooks or articles include a statement such as

When you have many correlated variables, principal component analysis (PCA) is a classical technique to reduce the dimensionality of the problem. The PCA finds a smaller dimensional linear subspace that explains most of the variability in the data. There are many statistical tools that help you decide how many principal

To a numerical analyst, numerical integration has two meanings. Numerical integration often means solving a definite integral such as $$int_{a}^b f(s) , ds$$. Numerical integration is also called quadrature because it computes areas. The other meaning applies to solving ordinary differential equations (ODEs). My article about new methods for solving

Recently, I showed how to use a heat map to visualize measurements over time for a set of patients in a longitudinal study. The visualization is sometimes called a lasagna plot because it presents an alternative to the usual spaghetti plot. A reader asked whether a similar visualization can be

What is McNemar's test? How do you run the McNemar test in SAS? Why might other statistical software report a value for McNemar's test that is different from the SAS value? SAS supports an exact version of the McNemar test, but when should you use it? This article answers these

Some matrices are so special that they have names. The identity matrix is the most famous, but many are named after a researcher who studied them such as the Hadamard, Hilbert, Sylvester, Toeplitz, and Vandermonde matrices. This article is about the Pascal matrix, which is formed by using elements from

Many discussions and articles about SAS Viya emphasize its ability to handle Big Data, perform parallel processing, and integrate open-source languages. These are important issues for some SAS customers. But for customers who program in SAS and do not have Big Data, SAS Viya is attractive because it is the

The graph to the right is the quantile function for the standard normal distribution, which is sometimes called the probit function. Given any probability, p, the quantile function gives the value, x, such that the area under the normal density curve to the left of x is exactly p. This

Oh, no! Your boss just told you to change the way that SAS displays certain features in graphs, such as missing values. But you have a library of hundreds of SAS programs! Do you need to modify all of your previous programs? Fortunately, the answer is no. SAS provides ODS

In an article about how to visualize missing data in a heat map, I noted that the SAS SG procedures (such as PROC SGPLOT) use the GraphMissing style element to color a bar or tile that represents a missing value. In the HTMLBlue ODS style, the color for missing values