Blogs

Blogs

Author

Rick Wicklin

Rick Wicklin RSS
Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Analytics | Data Visualization | Learn SAS

Rick WicklinDecember 17, 2018 0

Create a probability plot in SAS

Many data analysts use a quantile-quantile plot (Q-Q plot) to graphically assess whether data can be modeled by a probability distribution such as the normal, lognormal, or gamma distribution. You can use the QQPLOT statement in PROC UNIVARIATE to create a Q-Q plot for about a dozen common distributions. However,

Read More

Analytics | Learn SAS | Programming Tips

Process flow diagram shows how to resample data to create a bootstrap distribution.

Rick WicklinDecember 12, 2018 0

The essential guide to bootstrapping in SAS

This article describes best practices and techniques that every data analyst should know before bootstrapping in SAS. The bootstrap method is a powerful statistical technique, but it can be a challenge to implement it efficiently. An inefficient bootstrap program can take hours to run, whereas a well-written program can give

Read More

Data Visualization | Learn SAS

Rick WicklinDecember 10, 2018 0

Visualize Christmas songs

The best way to spread Christmas cheer is singing loud for all to hear! -Buddy in Elf In the Christmas movie Elf (2003), Jovie (played by Zooey Deschanel) must "spread Christmas cheer" to help Santa. She chooses to sing "Santa Claus is coming to town," and soon all of New

Read More

Learn SAS | Programming Tips

Rick WicklinDecember 5, 2018 0

When is a histogram not a histogram? When it's a table!

Recently a SAS programmer wanted to obtain a table of counts that was based on a histogram. I showed him how you can use the OUTHIST= option on the HISTOGRAM statement in PROC UNIVARIATE to obtain that information. For example, the following call to PROC UNIVARIATE creates a histogram for

Read More

Data Visualization | Learn SAS

Rick WicklinDecember 3, 2018 0

5 tips for customizing legends in PROC SGPLOT in SAS

When a graph includes several markers or line styles, it is often useful to create a legend that explains the relationship between the data and the symbols, color, and line styles in the graph. The SGPLOT procedure does a good job of automatically creating and placing a legend for most

Read More

Programming Tips

Rick WicklinNovember 28, 2018 0

Singular parameterizations, generalized inverses, and regression estimates

I remember the first time I used PROC GLM in SAS to include a classification effect in a regression model. I thought I had done something wrong because the parameter estimates table was followed by a scary-looking note: Note: The X'X matrix has been found to be singular, and a

Read More

Analytics | Data Visualization

Rick WicklinNovember 26, 2018 0

A funnel plot for immunization rates

Last week my colleague, Robert Allison, visualized data regarding immunization rates for kindergarten classes in North Carolina. One of his graphs was a scatter plot that displayed the proportion of unimmunized students versus the size of the class for 1,885 kindergarten classes in NC. This scatter plot is the basis

Read More

Analytics | Learn SAS

Graph of norm of solutions to the singular system A*b=c. The norm is plotted for vectors b + alpha*x_Null where b is the Moore-Penrose solution and x_Null is a basis for the nullspace of A.

Rick WicklinNovember 21, 2018 0

Generalized inverses for matrices

A data analyst asked how to compute parameter estimates in a linear regression model when the underlying data matrix is rank deficient. This situation can occur if one of the variables in the regression is a linear combination of other variables. It also occurs when you use the GLM parameterization

Read More

Learn SAS | Programming Tips

Rick WicklinNovember 19, 2018 0

Select ODS tables by using wildcards and regular expressions in SAS

You might know that you can use the ODS SELECT statement to display only some of the tables and graphs that are created by a SAS procedure. But did you know that you can use a WHERE clause on the ODS SELECT statement to display tables that match a pattern?

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinNovember 14, 2018 0

Create and compare ROC curves for any predictive model

An ROC curve graphically summarizes the tradeoff between true positives and true negatives for a rule or model that predicts a binary response variable. An ROC curve is a parametric curve that is constructed by varying the cutpoint value at which estimated probabilities are considered to predict the binary event.

Read More

Data Visualization | Learn SAS

Rick WicklinNovember 12, 2018 0

Define custom color ramps by using the RANGEATTRMAP statement in GTL

The SGPLOT procedure enables you to use the value of a response variable to color markers or areas in a graph. For example, you can use the COLORRESPONSE= option to define a variable whose values will be used to color markers in a scatter plot or cells in a heat

Read More

Programming Tips

Rick WicklinNovember 7, 2018 0

Visualize the feasible region for a constrained optimization

When solving optimization problems, it is harder to specify a constrained optimization than an unconstrained one. A constrained optimization requires that you specify multiple constraints. One little typo or a missing minus sign can result in an infeasible problem or a solution that is unrelated to the true problem. This

Read More

Analytics | Learn SAS

Rick WicklinNovember 5, 2018 0

Fit the Pareto distribution in SAS

Will the real Pareto distribution please stand up? SAS supports three different distributions that are named "Pareto." The Wikipedia page for the Pareto distribution lists five different "Pareto" distributions, including the three that SAS supports. This article shows how to fit the two-parameter Pareto distribution in SAS and discusses the

Read More

Data Visualization | Learn SAS | Programming Tips

Rick WicklinOctober 31, 2018 0

A trick to plot groups in PROC SGPLOT

A useful feature in PROC SGPLOT is the ability to easily visualize subgroups of data. Most statements in the SGPLOT procedure support a GROUP= option that enables you to overlay plots of subgroups. When you use the GROUP= option, observations are assigned attributes (colors, line patterns, symbols, ...) that indicate

Read More

Programming Tips

Rick WicklinOctober 29, 2018 0

Bootstrap regression estimates: Residual resampling

If you want to bootstrap the parameters in a statistical regression model, you have two primary choices. The first, case resampling, is discussed in a previous article. This article describes the second choice, which is resampling residuals (also called model-based resampling). This article shows how to implement residual resampling in

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinOctober 24, 2018 0

Bootstrap regression estimates: Case resampling

If you want to bootstrap the parameters in a statistical regression model, you have two primary choices. The first is case resampling, which is also called resampling observations or resampling pairs. In case resampling, you create the bootstrap sample by randomly selecting observations (with replacement) from the original data. The

Read More

Programming Tips

Rick WicklinOctober 22, 2018 0

Transpose blocks to reshape data

A SAS programmer asked how to rearrange elements of a matrix. The rearrangement he wanted was rather complicated: certain blocks of data needed to move relative to other blocks, but the values within each block were to remain unchanged. It turned out that the mathematical operation he needed is called

Read More

Analytics | Learn SAS

Rick WicklinOctober 17, 2018 0

Parameter estimates for different parameterizations

In a recent article about nonlinear least squares, I wrote, "you can often fit one model and use the ESTIMATE statement to estimate the parameters in a different parameterization." This article expands on that statement. It shows how to fit a model for one set of parameters and use the

Read More

Analytics | Learn SAS

Rick WicklinOctober 15, 2018 0

Get the unique values of a variable in data order

There are several ways to use SAS to get the unique values for a data variable. In Base SAS, you can use the TABLES statement in PROC FREQ to generate a table of unique values (and the counts). You can also use the DISTINCT function in PROC SQL to get

Read More

Analytics | Learn SAS

Rick WicklinOctober 10, 2018 0

Fit a growth curve in SAS

This article shows how to use SAS to fit a growth curve to data. Growth curves model the evolution of a quantity over time. Examples include population growth, the height of a child, and the growth of a tumor cell. This article focuses on using PROC NLIN to estimate the

Read More

Programming Tips

Rick WicklinOctober 8, 2018 0

The intersection of multiple sets

This article compares several ways to find the elements that are common to multiple sets. I test which method is the fastest in the SAS/IML language. However, all algorithms are intrinsically fast, which raises an important question: when is it worth the time and effort to optimize an algorithm? The

Read More

Analytics | Programming Tips

Rick WicklinOctober 3, 2018 0

Fast simulation of multivariate normal data with an AR(1) correlation structure

It is sometimes necessary for researchers to simulate data with thousands of variables. It is easy to simulate thousands of uncorrelated variables, but more difficult to simulate thousands of correlated variables. For that, you can generate a correlation matrix that has special properties, such as a Toeplitz matrix or a

Read More

Programming Tips

Rick WicklinOctober 1, 2018 0

Chi-square tests for proportions in one-way tables

Programmers on a SAS discussion forum recently asked about the chi-square test for proportions as implemented in PROC FREQ in SAS. One person asked the basic question, "how do I test the null hypothesis that the observed proportions are equal to a set of known proportions?" Another person said that

Read More

Programming Tips

Rick WicklinSeptember 26, 2018 0

Radial basis functions and Gaussian kernels in SAS

A radial basis function is a scalar function that depends on the distance to some point, called the center point, c. One popular radial basis function is the Gaussian kernel φ(x; c) = exp(-||x – c||2 / (2 σ2)), which uses the squared distance from a vector x to the

Read More

Programming Tips

Rick WicklinSeptember 24, 2018 0

How many perfect riffle shuffles are required to restore a deck to its initial order?

Last week I compared the overhand shuffle to the riffle shuffle. I used random operations to simulate both kinds of shuffles and then compared how well they mix cards. The article caused one my colleague and fellow blogger, Rob Pratt, to ask if I was familiar with a bit of

Read More

Analytics | Data Visualization

Rick WicklinSeptember 19, 2018 0

Shuffling smackdown: Overhand shuffle versus riffle shuffle

Every day I’m shufflin'. Shufflin', shufflin'. -- "Party Rock Anthem," LMFAO The most popular way to mix a deck of cards is the riffle shuffle, which separates the deck into two pieces and interleaves the cards from each piece. Besides being popular with card players, the riffle shuffle is

Read More

Programming Tips

Rick WicklinSeptember 17, 2018 0

Linearly spaced vectors in SAS

The SAS/IML language and the MATLAB language are similar. Both provide a natural syntax for performing high-level computations on vectors and matrices, including basic linear algebra subroutines. Sometimes a SAS programmer will convert an algorithm from MATLAB into SAS/IML. Because the languages are not identical, I am sometimes asked, "what

Read More

Analytics

Rick WicklinSeptember 12, 2018 0

Two interfaces for typing text by using a TV remote control

Have you ever tried to type a movie title by using a TV remote control? Both Netflix and Amazon Video provide an interface (a virtual keyboard) that enables you to use the four arrow keys of a standard remote control to type letters. The letters are arranged in a regular

Read More

Programming Tips

Visualization of L1 distance matrix for items arranged on a 6 x 6 grid

Rick WicklinSeptember 10, 2018 0

Distances on rectangular grids

Given a rectangular grid with unit spacing, what is the expected distance between two random vertices, where distance is measured in the L1 metric? (Here "random" means "uniformly at random.") I recently needed this answer for some small grids, such as the one to the right, which is a 7 x 6

Read More

Programming Tips

Rick WicklinSeptember 6, 2018 0

The continued fraction representation of a rational number

Continued fractions show up in surprising places. They are used in the numerical approximations of certain functions, including the evaluation of the normal cumulative distribution function (normal CDF) for large values of x (El-bolkiny, 1995, p. 75-77) and in approximating the Lambert W function, which has applications in the modeling

Read More

Previous 1 … 18 19 20 21 22 … 53 Next