Blogs

Blogs

Author

Rick Wicklin RSS
Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Analytics | Data Visualization | Learn SAS

Rick WicklinJune 24, 2019 0

Add loess smoothers to residual plots

When fitting a least squares regression model to data, it is often useful to create diagnostic plots of the residuals versus the explanatory variables. If the model fits the data well, the plots of the residuals should not display any patterns. Systematic patterns can indicate that you need to include

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinJune 19, 2019 0

Influential observations in a linear regression model: The DFFITS and Cook's D statistics

A previous article describes the DFBETAS statistics for detecting influential observations, where "influential" means that if you delete the observation and refit the model, the estimates for the regression coefficients change substantially. Of course, there are other statistics that you could use to measure influence. Two popular ones are the

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinJune 17, 2019 0

Influential observations in a linear regression model: The DFBETAS statistics

My article about deletion diagnostics investigated how influential an observation is to a least squares regression model. In other words, if you delete the i_th observation and refit the model, what happens to the statistics for the model? SAS regression procedures provide many tables and graphs that enable you to

Read More

Advanced Analytics | Programming Tips

Rick WicklinJune 12, 2019 0

Leave-one-out statistics and a formula to update a matrix inverse

For linear regression models, there is a class of statistics that I call deletion diagnostics or leave-one-out statistics. These observation-wise statistics address the question, "If I delete the i_th observation and refit the model, what happens to the statistics for the model?" For example: The PRESS statistic is similar to

Read More

Learn SAS | Programming Tips

Rick WicklinJune 10, 2019 0

5 reasons to use PROC FORMAT to recode variables in SAS

Recoding variables can be tedious, but it is often a necessary part of data analysis. Almost every SAS programmer has written a DATA step that uses IF-THEN/ELSE logic or the SELECT-WHEN statements to recode variables. Although creating a new variable is effective, it is also inefficient because you have to

Read More

Data Visualization | Learn SAS | Programming Tips

Rick WicklinJune 5, 2019 0

Plot a family of curves in SAS

A family of curves is generated by an equation that has one or more parameters. To visualize the family, you might want to display a graph that overlays four of five curves that have different parameter values, as shown to the right. The graph shows members of a family of

Read More

Data Visualization | Learn SAS | Programming Tips

Rick WicklinJune 3, 2019 0

Graph wide data and long data in SAS

Statistical programmers and analysts often use two kinds of rectangular data sets, popularly known as wide data and long data. Some analytical procedures require that the data be in wide form; others require long form. (The "long format" is sometimes called "narrow" or "tall" data.) Fortunately, the statistical graphics procedures

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinMay 30, 2019 0

Visualize interaction effects in regression models

Knowing how to visualize a regression model is a valuable skill. A good visualization can help you to interpret a model and understand how its predictions depend on explanatory factors in the model. Visualization is especially important in understanding interactions between factors. Recently I read about work by Jacob A.

Read More

Analytics | Programming Tips

Rick WicklinMay 28, 2019 0

The Theil-Sen robust estimator for simple linear regression

Modern statistical software provides many options for computing robust statistics. For example, SAS can compute robust univariate statistics by using PROC UNIVARIATE, robust linear regression by using PROC ROBUSTREG, and robust multivariate statistics such as robust principal component analysis. Much of the research on robust regression was conducted in the

Read More

Analytics | Data Visualization

Rick WicklinMay 22, 2019 0

Gershgorin discs and the location of eigenvalues

The eigenvalues of a matrix are not easy to compute. It is remarkable, therefore, that with relatively simple mental arithmetic, you can obtain bounds for the eigenvalues of a matrix of any size. The bounds are provided by using a marvelous mathematical result known as Gershgorin's Disc Theorem. For certain

Read More

Analytics | Programming Tips

Rick WicklinMay 20, 2019 0

Critical values of the Kolmogorov-Smirnov test

Recently I wrote about how to compute the Kolmogorov D statistic, which is used to determine whether a sample has a particular distribution. One of the beautiful facts about modern computational statistics is that if you can compute a statistic, you can use simulation to estimate the sampling distribution of

Read More

Analytics | Learn SAS

Rick WicklinMay 15, 2019 0

What is Kolmogorov's D statistic?

Have you ever run a statistical test to determine whether data are normally distributed? If so, you have probably used Kolmogorov's D statistic. Kolmogorov's D statistic (also called the Kolmogorov-Smirnov statistic) enables you to test whether the empirical distribution of data is different than a reference distribution. The reference distribution

Read More

Learn SAS | Programming Tips

Rick WicklinMay 13, 2019 0

Write to a SAS data set from inside a SAS/IML loop

In SAS/IML programs, a common task is to write values in a matrix to a SAS data set. For some programs, the values you want to write are in a matrix and you use the CREATE FROM/APPEND FROM syntax to create the data set, as follows: proc iml; X =

Read More

Analytics | Programming Tips

Rick WicklinMay 8, 2019 0

Discrimination, accuracy, and stability in binary classifiers

At SAS Global Forum 2019, Daymond Ling presented an interesting discussion of binary classifiers in the financial industry. The discussion is motivated by a practical question: If you deploy a predictive model, how can you assess whether the model is no longer working well and needs to be replaced? Daymond

Read More

Analytics | Programming Tips

Rick WicklinMay 6, 2019 0

How to simulate data from a generalized linear model

Here's a simulation tip: When you simulate a fixed-effect generalized linear regression model, don't add a random normal error to the linear predictor. Only the response variable should be random. This tip applies to models that apply a link function to a linear predictor, including logistic regression, Poisson regression, and

Read More

Analytics | Learn SAS

Rick WicklinMay 1, 2019 0

Encodings of CLASS variables in SAS regression procedures: A cheat sheet

SAS regression procedures support several parameterizations of classification variables. When a categorical variable is used as an explanatory variable in a regression model, the procedure generates dummy variables that are used to construct a design matrix for the model. The process of forming columns in a design matrix is called

Read More

Learn SAS | Programming Tips

Rick WicklinApril 29, 2019 0

The normal mixture distribution in SAS

Did you know that SAS provides built-in support for working with probability distributions that are finite mixtures of normal distributions? This article shows examples of using the "NormalMix" distribution in SAS and describes a trick that enables you to easily work with distributions that have many components. As with all

Read More

Analytics | Programming Tips

Rick WicklinApril 24, 2019 0

A CUSUM test for autregressive models

The CUSUM test has many incarnations. Different areas of statistics use different assumption and test for different hypotheses. This article presents a brief overview of CUSUM tests and gives an example of using the CUSUM test in PROC AUTOREG for autoregressive models in SAS. A CUSUM test uses the cumulative

Read More

Programming Tips

Rick WicklinApril 22, 2019 0

The CUSUM test for randomness of a binary sequence

Many statistical tests use a CUSUM statistic as part of the test. It can be confusing when a researcher refers to "the CUSUM test" without providing details about exactly which CUSUM test is being used. This article describes a CUSUM test for the randomness of a binary sequence. You start

Read More

Programming Tips

Rick WicklinApril 17, 2019 0

Create your own version of Anscombe's quartet: Dissimilar data that have similar statistics

I think every course in exploratory data analysis should begin by studying Anscombe's quartet. Anscombe's quartet is a set of four data sets (N=11) that have nearly identical descriptive statistics but different graphical properties. They are a great reminder of why you should graph your data. You can read about

Read More

Programming Tips

Rick WicklinApril 15, 2019 0

Efficient evaluation of a quadratic form

A quadratic form is a second-degree polynomial that does not have any linear or constant terms. For multivariate polynomials, you can quickly evaluate a quadratic form by using the matrix expression x` A x This computation is straightforward in a matrix language such as SAS/IML. However, some computations in statistics

Read More

Analytics | Programming Tips

Rick WicklinApril 10, 2019 0

4 ways to compute an SSCP matrix

In numerical linear algebra, there are often multiple ways to solve a problem, and each way is useful in various contexts. In fact, one of the challenges in matrix computations is choosing from among different algorithms, which often vary in their use of memory, data access, and speed. This article

Read More

Learn SAS | Programming Tips

Rick WicklinApril 8, 2019 0

Use the FLOOR-MOD trick to allocate items to groups

Suppose you need to assign 100 patients equally among 3 treatment groups in a clinical study. Obviously, an equal allocation is impossible because the second number does not evenly divide the first, but you can get close by assigning 34 patients to one group and 33 to the others. Mathematically,

Read More

Learn SAS | Programming Tips

Rick WicklinApril 3, 2019 0

Convergence in mixed models: When the estimated G matrix is not positive definite

I've previously written about how to deal with nonconvergence when fitting generalized linear regression models. Most generalized linear and mixed models use an iterative optimization process, such as maximum likelihood estimation, to fit parameters. The optimization might not converge, either because the initial guess is poor or because the model

Read More

Learn SAS | Programming Tips

Rick WicklinApril 1, 2019 0

Matrix operations and BY groups

Many SAS procedures support the BY statement, which enables you to perform an analysis for subgroups of the data set. Although the SAS/IML language does not have a built-in "BY statement," there are various techniques that enable you to perform a BY-group analysis. The two I use most often are

Read More

Analytics | Data Visualization

Rick WicklinMarch 27, 2019 0

How to simulate multivariate outliers

In simulation studies, sometimes you need to simulate outliers. For example, in a simulation study of regression techniques, you might want to generate outliers in the explanatory variables to see how the technique handles high-leverage points. This article shows how to generate outliers in multivariate normal data that are a

Read More

Programming Tips

Schematic diagram of outliers in bivariate normal data. The point 'A' has large univariate z scores but a small Mahalanobis distance. The point 'B' has a large Mahalanobis distance. Only 'b' is a multivariate outlier.

Rick WicklinMarch 25, 2019 0

The geometry of multivariate versus univariate outliers

An important concept in multivariate statistical analysis is the Mahalanobis distance. The Mahalanobis distance provides a way to measure how far away an observation is from the center of a sample while accounting for correlations in the data. The Mahalanobis distance is a good way to detect outliers in multivariate

Read More

Data Visualization | Learn SAS

Rick WicklinMarch 20, 2019 0

Truncate response surfaces

An analyst was using SAS to analyze some data from an experiment. He noticed that the response variable is always positive (such as volume, size, or weight), but his statistical model predicts some negative responses. He posted the data and asked if it is possible to modify the graph so

Read More

Programming Tips

Rick WicklinMarch 18, 2019 0

Interpolation vs extrapolation: the convex hull of multivariate data

Statisticians often emphasize the dangers of extrapolating from a univariate regression model. A common exercise in introductory statistics is to ask students to compute a model of population growth and predict the population far in the future. The students learn that extrapolating from a model can result in a nonsensical

Read More

Programming Tips

Rick WicklinMarch 13, 2019 0

The value of pi depends on how you measure distance

It's time to celebrate Pi Day! Every year on March 14th (written 3/14 in the US), math-loving folks celebrate "all things pi-related" because 3.14 is the three-decimal approximation to the mathematical constant, π. Although children learn that pi is approximately 3.14159..., the actual definition of π is the ratio of

Read More

Previous 1 … 16 17 18 19 20 … 52 Next