Tag: Statistical Programming

Rick Wicklin 0
Compute sample quantiles by using the QNTL call

SAS provides several ways to compute sample quantiles of data. The UNIVARIATE procedure can compute quantiles (also called percentiles), but you can also compute them in the SAS/IML language. Prior to SAS/IML 9.22 (released in 2010) statistical programmers could call a SAS/IML module that computes sample quantiles. With the release

Rick Wicklin 0
Quantiles of discrete distributions

I work with continuous distributions more often than with discrete distributions. Consequently, I am used to thinking of the quantile function as being an inverse cumulative distribution function (CDF). (These functions are described in my article, "Four essential functions for statistical programmers.") For discrete distributions, they are not. To quote

Rick Wicklin 0
Testing data for multivariate normality

I've blogged several times about multivariate normality, including how to generate random values from a multivariate normal distribution. But given a set of multivariate data, how can you determine if it is likely to have come from a multivariate normal distribution? The answer, of course, is to run a goodness-of-fit

Advanced Analytics
Rick Wicklin 0
Use the Cholesky transformation to correlate and uncorrelate variables

A variance-covariance matrix expresses linear relationships between variables. Given the covariances between variables, did you know that you can write down an invertible linear transformation that "uncorrelates" the variables? Conversely, you can transform a set of uncorrelated variables into variables with given covariances. The transformation that works this magic is

Rick Wicklin 0
Detecting outliers in SAS: Part 2: Estimating scale

In a previous blog post on robust estimation of location, I worked through some of the examples in the survey article, "Robust statistics for outlier detection," by Peter Rousseeuw and Mia Hubert. I showed that SAS/IML software and PROC UNIVARIATE both support the robust estimators of location that are mentioned

Rick Wicklin 0
Compute a running mean and variance

In my recent article on simulating Buffon's needle experiment, I computed the "running mean" of a series of values by using a single call to the CUSUM function in the SAS/IML language. For example, the following SAS/IML statements define a RunningMean function, generate 1,000 random normal values, and compute the

Rick Wicklin 0
Overlay density estimates on a plot

A recent question on a SAS Discussion Forum was "how can you overlay multiple kernel density estimates on a single plot?" There are three ways to do this, depending on your goals and objectives. Overlay different estimates of the same variable Sometimes you have a single variable and want to

Rick Wicklin 0
How to lie with a simulation

In my article on Buffon's needle experiment, I showed a graph that converges fairly nicely and regularly to the value π, which is the value that the simulation is trying to estimate. This graph is, indeed, a typical graph, as you can verify by running the simulation yourself. However, notice

Rick Wicklin 0
Simulation of Buffon's needle in SAS

Buffon's needle experiment for estimating π is a classical example of using an experiment (or a simulation) to estimate a probability. This example is presented in many books on statistical simulation and is famous enough that Brian Ripley in his book Stochastic Simulation states that the problem is "well known

Rick Wicklin 0
New 2012 resolutions for my blog

Hello, 2012! It's a New Year and I'm flushed with ideas for new blog articles. (You can also read about The DO Loop's most popular posts of 2011.) The fundamental purpose of my blog is to present tips and techniques for writing efficient statistical programs in SAS. I pledge to

Rick Wicklin 0
Recoding a character variable as numeric

The other day someone posted the following question to the SAS-L discussion list: Is there a SAS PROC out there that takes a multi-category discrete variable with character categories and converts it to a single numeric coded variable (not a set of dummy variables) with the character categories assigned as

Rick Wicklin 0
Funnel plots for proportions

I have previously written about how to create funnel plots in SAS software. A funnel plot is a way to compare the aggregated performance of many groups without ranking them. The groups can be states, counties, schools, hospitals, doctors, airlines, and so forth. A funnel plot graphs a performance metric

Rick Wicklin 0
On the median of the chi-square distribution

I was at the Wikipedia site the other day, looking up properties of the Chi-square distribution. I noticed that the formula for the median of the chi-square distribution with d degrees of freedom is given as ≈ d(1-2/(9d))3. However, there is no mention of how well this formula approximates the

Rick Wicklin 0
The UNIQUE-LOC trick: A real treat!

When you analyze data, you will occasionally have to deal with categorical variables. The typical situation is that you want to repeat an analysis or computation for each level (category) of a categorical variable. For example, you might want to analyze males separately from females. Unlike most other SAS procedures,

Rick Wicklin 0
Video: Calling R from the SAS/IML Language

In SAS/IML 9.22 and beyond, you can call the R statistical programming language from within a SAS/IML program. The syntax is similar to the syntax for calling SAS from SAS/IML: You use a SUBMIT statement, but add the R option: SUBMIT / R. All statements in the program between the

Rick Wicklin 0
Four essential functions for statistical programmers

Normal, Poisson, exponential—these and other "named" distributions are used daily by statisticians for modeling and analysis. There are four operations that are used often when you work with statistical distributions. In SAS software, the operations are available by using the following four functions, which are essential for every statistical programmer

Rick Wicklin 0
Optimizing? Two hints for specifying derivatives

I previously wrote about using SAS/IML for nonlinear optimization, and demonstrated optimization by maximizing a likelihood function. Many well-known optimization algorithms require derivative information during the optimization, including the conjugate gradient method (implemented in the NLPCG subroutine) and the Newton-Raphson method (implemented in the NLPNRA method). You should specify analytic

1 10 11 12 13 14 15