Blogs

Blogs

Author

Rick Wicklin

Rick Wicklin RSS
Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Data Visualization

Rick WicklinMay 23, 2018 0

A butterfly plot for comparing distributions

This article shows how to construct a butterfly plot in SAS. A butterfly plot (also called a butterfly chart) is a comparative bar chart or histogram that displays the distribution of a variable for two subpopulations. A butterfly plot for the cholesterol readings of 5,057 patients in a medical study

Read More

Programming Tips

Rick WicklinMay 21, 2018 0

Position items in a grid

In a recent blog post, Chris Hemedinger used a scatter plot to show the result of 100 coin tosses. Chris arranged the 100 results in a 10 x 10 grid, where the first 10 results were shown on the first row, the second 10 were shown on the second row, and so

Read More

Analytics | Data Visualization

Decile calibration curve for a misspecified logistic regression model

Rick WicklinMay 16, 2018 0

Decile calibration plots in SAS

In my article about how to construct calibration plots for logistic regression models in SAS, I mentioned that there are several popular variations of the calibration plot. The previous article showed how to construct a loess-based calibration curve. Austin and Steyerberg (2013) recommend the loess-based curve on the basis of

Read More

Analytics | Data Visualization

Calibration plot for a misspecified logistic model

Rick WicklinMay 14, 2018 0

Calibration plots in SAS

A logistic regression model is a way to predict the probability of a binary response based on values of explanatory variables. It is important to be able to assess the accuracy of a predictive model. This article shows how to construct a calibration plot in SAS. A calibration plot is

Read More

Analytics | Programming Tips

How to generate random numbers in SAS

Rick WicklinMay 9, 2018 0

Independent streams of random numbers in SAS

In a previous blog post, I discussed ways to produce statistically independent samples from a random number generator (RNG). The best way is to generate all samples from one stream. However, if your program uses two or more SAS DATA steps to simulate the data, you cannot use the same

Read More

Programming Tips

Rick WicklinMay 7, 2018 0

Independence and overlap in streams of random numbers

Simulation studies require both randomness and reproducibility, two qualities that are sometimes at odds with each other. A Monte Carlo simulation might need to generate millions of random samples, where each sample contains dozens of continuous variables and many thousands of observations. In simulation studies, the researcher wants each sample

Read More

Analytics | Data Visualization

Rick WicklinMay 2, 2018 0

Order variables in a heat map or scatter plot matrix

Order matters. When you create a graph that has a categorical axis (such as a bar chart), it is important to consider the order in which the categories appear. Most software defaults to alphabetical order, which typically gives no insight into how the categories relate to each other. Alphabetical order

Read More

Analytics | Data Visualization

Rick WicklinApril 30, 2018 0

Assign colors in heat maps: A study of married couples and college majors

Some say that opposites attract. Others say that birds of a feather flock together. Which is it? Phillip N. Cohen, a professor of sociology at the University of Maryland, recently posted an interesting visualization that indicates that married couples who are college graduates tend to be birds of a feather.

Read More

Analytics | Programming Tips

Rick WicklinApril 25, 2018 0

An easier way to run thousands of regressions

SAS programmers on SAS discussion forums sometimes ask how to run thousands of regressions of the form Y = B0 + B1*X_i, where i=1,2,.... A similar question asks how to solve thousands of regressions of the form Y_i = B0 + B1*X for thousands of response variables. I have previously

Read More

Data Visualization

Rick WicklinApril 23, 2018 0

The 80-20 rule for blogs

You've probably heard about the "80-20 Rule," which describes many natural and manmade phenomena. This rule is sometimes called the "Pareto Principle" because it was discovered by Vilfredo Pareto (1848–1923) who used it to describe the unequal distribution of wealth. Specifically, in his study, 80% of the wealth was held

Read More

Programming Tips

Rick WicklinApril 18, 2018 0

The sweep operator: A fundamental operation in regression

The sweep operator performs elementary row operations on a system of linear equations. The sweep operator enables you to build regression models by "sweeping in" or "sweeping out" particular rows of the X`X matrix. As you do so, the estimates for the regression coefficients, the error sum of squares, and

Read More

Programming Tips

Rick WicklinApril 16, 2018 0

Random permutations without duplicates

A colleague and I recently discussed how to generate random permutations without encountering duplicates. Given a set of n items, there are n! permutations My colleague wants to generate k unique permutations at random from among the total of n!. Said differently, he wants to sample without replacement from the

Read More

Learn SAS | Programming Tips

Rick WicklinApril 11, 2018 0

Find the unique rows of a numeric matrix

Sometimes it is important to ensure that a matrix has unique rows. When the data are all numeric, there is an easy way to detect (and delete!) duplicate rows in a matrix. The main idea is to subtract one row from another. Start with the first row and subtract it

Read More

Work & Life at SAS

Rick WicklinApril 9, 2018 0

Taking in. Giving back.

When we breathe, we breathe in and breathe out. If we choose only one or the other, the results are disastrous. The same principle applies to professional growth and development. Whether we are programmers, statisticians, teachers, students, or writers, we benefit from taking in and giving back. We "take in"

Read More

Programming Tips

Rick WicklinApril 4, 2018 0

Distance correlation

Correlation is a statistic that measures how closely two variables are related to each other. The most popular definition of correlation is the Pearson product-moment correlation, which is a measurement of the linear relationship between two variables. Many textbooks stress the linear nature of the Pearson correlation and emphasize that

Read More

Learn SAS | Programming Tips

Rick WicklinApril 2, 2018 0

The chi-square test: An example of working with rows and columns in SAS

As a general rule, when SAS programmers want to manipulate data row by row, they reach for the SAS DATA step. When the computation requires column statistics, the SQL procedure is also useful. When both row and column operations are required, the SAS/IML language is a powerful addition to a

Read More

Analytics | Data Visualization | Programming Tips

Euclidean and L1 distances between observations and a target value for standardized data

Rick WicklinMarch 28, 2018 0

Find the distances between observations and a target value

Suppose you want to find observations in multivariate data that are closest to a numerical target value. For example, for the students in the Sashelp.Class data set, you might want to find the students whose (Age, Height, Weight) values are closest to the triplet (13, 62, 100). The way to

Read More

Analytics | Data Visualization

Rick WicklinMarch 26, 2018 0

A zipper plot for visualizing coverage probability in simulation studies

Simulation studies are used for many purposes, one of which is to examine how distributional assumptions affect the coverage probability of a confidence interval. This article describes the "zipper plot," which enables you to compare the coverage probability of a confidence interval when the data do or do not follow

Read More

Analytics | Learn SAS

Rick WicklinMarch 21, 2018 0

The conjugate gradient method

I often claim that the "natural syntax" of the SAS/IML language makes it easy to implement an algorithm or statistical formula as it appears in a textbook or journal. The other day I had an opportunity to test the truth of that statement. A SAS programmer wanted to implement the

Read More

Learn SAS | Programming Tips

Rick WicklinMarch 19, 2018 0

Compute with combinations: Maximize a function over combinations of variables

About once a month I see a question on the SAS Support Communities that involves what I like to call "computations with combinations." A typical question asks how to find k values (from a set of p values) that maximize or minimize some function, such as "I have 5 variables,

Read More

Data Visualization

Rick WicklinMarch 14, 2018 0

Visualize repetition in song lyrics

One of my favorite magazines, Significance, printed an intriguing image of a symmetric matrix that shows repetition in a song's lyrics. The image was created by Colin Morris, who has created many similar images. When I saw these images, I knew that I wanted to duplicate the analysis in SAS!

Read More

Analytics

Rick WicklinMarch 12, 2018 0

Pi, special functions, and distributions

Welcome to my annual Pi Day post. Every year on March 14th (written 3/14 in the US), geeky mathematicians and their friends celebrate "all things pi-related" because 3.14 is the three-decimal approximation to pi. Pi is a mathematical constant that never changes. Pi is the same value today as it

Read More

Analytics | Programming Tips

Rick WicklinMarch 7, 2018 0

Fit a distribution from quantiles

Data analysts often fit a probability distribution to data. When you have access to the data, a common technique is to use maximum likelihood estimation (MLE) to compute the parameters of a distribution that are "most likely" to have produced the observed data. However, how can you fit a distribution

Read More

Analytics | Programming Tips

The saddle point of a matrix

Rick WicklinMarch 5, 2018 0

The probability of a saddle point in a matrix

Many people know that a surface can contain a saddle point, but did you know that you can define the saddle point of a matrix? Saddle points in matrices are somewhat rare, which means that if you choose a random matrix you are unlikely to choose one that has a

Read More

Analytics | Programming Tips

Solve nonlinear system of equations in SAS

Rick WicklinFebruary 28, 2018 0

Solve a system of nonlinear equations with SAS

This article shows how to use SAS to solve a system of nonlinear equations. When there are n unknowns and n equations, this problem is equivalent to finding a multivariate root of a vector-valued function F(x) = 0 because you can always write the system as f1(x1, x2, ..., xn)

Read More

Learn SAS | Programming Tips

Rick WicklinFebruary 26, 2018 0

How to use FIRST.variable and LAST.variable in a BY-group analysis in SAS

My article about the difference between CLASS variables and BY variables in SAS focused on SAS analytical procedures. However, the BY statement is also useful in the SAS DATA step where it is used to merge data sets and to analyze data at the group level. When you use the

Read More

Advanced Analytics

Sample from mixture distribution showing sample median

Rick WicklinFebruary 21, 2018 0

A Monte Carlo algorithm to estimate a median

This article describes and implements a fast algorithm that estimates a median for very large samples. The traditional median estimate sorts a sample of size N and returns the middle value (when N is odd). The algorithm in this article uses Monte Carlo techniques to estimate the median much faster.

Read More

Analytics | Programming Tips

Quantiles are the solutions to the equation CDF(x)-p=0, where p is a probability

Rick WicklinFebruary 19, 2018 0

Compute the quantiles of any distribution

Your statistical software probably provides a function that computes quantiles of common probability distributions such as the normal, exponential, and beta distributions. Because there are infinitely many probability distributions, you might encounter a distribution for which a built-in quantile function is not implemented. No problem! This article shows how to

Read More

Analytics | Learn SAS

Rick WicklinFebruary 14, 2018 0

The difference between CLASS statements and BY statements in SAS

When I first learned to program in SAS, I remember being confused about the difference between CLASS statements and BY statements. A novice SAS programmer recently asked when to use one instead of the other, so this article explains the difference between the CLASS statement and BY variables in SAS

Read More

Data Visualization

Use SAS to create a merged legend that shows symbols and line patterns in a single legend

Rick WicklinFebruary 12, 2018 0

Merged legends: Overlay a symbol and line in a legend item

Did you know that SAS can combine or "merge" a symbol and a line pattern into a single legend item, as shown below? This kind of legend is useful when you are overlaying a group of curves onto a scatter plot. It enables the reader to quickly associate values of

Read More

Previous 1 … 20 21 22 23 24 … 53 Next