Author

Rick Wicklin
RSS
Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Rick Wicklin 0
Generating a random orthogonal matrix

Because I am writing a new book about simulating data in SAS, I have been doing a lot of reading and research about how to simulate various quantities. Random integers? Check! Random univariate samples? Check! Random multivariate samples? Check! Recently I've been researching how to generate random matrices. I've blogged

Rick Wicklin 0
ANY versus ALL: Testing the elements of a vector

The fundamental units in the SAS/IML language are matrices and vectors. Consequently, you might wonder about conditional expression such as if v>0 then.... What does this expression mean when v contains more than a single element? Evaluating vector expressions When you test a vector for some condition, expressions like v>0

Rick Wicklin 0
Row vectors versus column vectors

The SAS/IML language supports both row vectors and column vectors. This is useful for performing linear algebra, but it can cause headaches when you are writing a SAS/IML module. I want my modules to be able to handle both row vectors and column vectors. I don't want the user to

Rick Wicklin 0
Linear interpolation in SAS/IML

A recent discussion on the SAS-L discussion forum concerned how to implement linear interpolation in SAS. Some people suggested using PROC EXPAND in SAS/ETS software, whereas others proposed a DATA step solution. For me, the SAS/IML language provides a natural programming environment to implement an interpolation scheme. It also provides

Rick Wicklin 0
Compute sample quantiles by using the QNTL call

SAS provides several ways to compute sample quantiles of data. The UNIVARIATE procedure can compute quantiles (also called percentiles), but you can also compute them in the SAS/IML language. Prior to SAS/IML 9.22 (released in 2010) statistical programmers could call a SAS/IML module that computes sample quantiles. With the release

Rick Wicklin 0
Quantiles of discrete distributions

I work with continuous distributions more often than with discrete distributions. Consequently, I am used to thinking of the quantile function as being an inverse cumulative distribution function (CDF). (These functions are described in my article, "Four essential functions for statistical programmers.") For discrete distributions, they are not. To quote

Rick Wicklin 0
Testing data for multivariate normality

I've blogged several times about multivariate normality, including how to generate random values from a multivariate normal distribution. But given a set of multivariate data, how can you determine if it is likely to have come from a multivariate normal distribution? The answer, of course, is to run a goodness-of-fit

Advanced Analytics
Rick Wicklin 0
What is Mahalanobis distance?

I previously described how to use Mahalanobis distance to find outliers in multivariate data. This article takes a closer look at Mahalanobis distance. A subsequent article will describe how you can compute Mahalanobis distance. Distance in standard units In statistics, we sometimes measure "nearness" or "farness" in terms of the

Advanced Analytics
Rick Wicklin 0
Use the Cholesky transformation to correlate and uncorrelate variables

A variance-covariance matrix expresses linear relationships between variables. Given the covariances between variables, did you know that you can write down an invertible linear transformation that "uncorrelates" the variables? Conversely, you can transform a set of uncorrelated variables into variables with given covariances. The transformation that works this magic is

Rick Wicklin 0
How to access SAS sample programs

Have you ever wanted to run a sample program from the SAS documentation or wanted to use a data set that appears in the SAS documentation? You can: all programs and data sets in the documentation are distributed with SAS, you just have to know where to look! Sample data

Rick Wicklin 0
Random number seeds: Only the first seed matters!

The other day I encountered the following SAS DATA step for generating three normally distributed variables. Study it, and see if you can discover what is unnecessary (and misleading!) about this program: data points; drop i; do i=1 to 10; x=rannor(34343); y=rannor(12345); z=rannor(54321); output; end; run; The program creates the

Rick Wicklin 0
Detecting outliers in SAS: Part 2: Estimating scale

In a previous blog post on robust estimation of location, I worked through some of the examples in the survey article, "Robust statistics for outlier detection," by Peter Rousseeuw and Mia Hubert. I showed that SAS/IML software and PROC UNIVARIATE both support the robust estimators of location that are mentioned

Rick Wicklin 0
Explaining coincidence

I was on vacation when a family member sidled up to me. "Rick, you're a statistician..." he began. I knew I was in trouble. He proceeded to tell me the story of Joseph "Newsboy" Moriarty, a New Jersey mobster who rose to prominence and became known as the bookie who

Rick Wicklin 0
Constants in SAS

Statistical programmers often need mathematical constants such as π (3.14159...) and e (2.71828...). Programmers of numerical algorithms often need to know machine-specific constants such as the machine precision constant (2.22E-16 on my Windows PC) or the largest representable double-precision value (1.798E308 on my Windows PC). Some computer languages build these

Rick Wicklin 0
Compute a running mean and variance

In my recent article on simulating Buffon's needle experiment, I computed the "running mean" of a series of values by using a single call to the CUSUM function in the SAS/IML language. For example, the following SAS/IML statements define a RunningMean function, generate 1,000 random normal values, and compute the

1 40 41 42 43 44 50