Blogs

Blogs

Author

Rick Wicklin RSS
Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Data Visualization | Learn SAS | Programming Tips

Rick WicklinMay 6, 2024 0

Visualize patterns of missing values

Years ago, I wrote an article that showed how to visualize patterns of missing data. During a recent data visualization talk, I discussed the program, which used a small number of SAS IML statements. An audience member asked whether it is possible to construct the same visualization by using only

Read More

Analytics | Learn SAS

Rick WicklinMay 1, 2024 0

Estimate a proportion and a confidence interval in SAS

A SAS programmer wanted to estimate a proportion and a confidence interval (CI), but didn't know which SAS procedure to call. He knows a formula for the CI from an elementary statistics textbook. If x is the observed count of events in a random sample of size n, then the

Read More

Analytics | Programming Tips

Rick WicklinApril 29, 2024 0

Bimodal and unimodal beta distributions

In a recent article, I graphed the PDF of a few Beta distributions that had a variety of skewness and kurtosis values. I thought that I had chosen the parameter values to represent a wide variety of Beta shapes. However, I was surprised to see that the distributions were all

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinApril 22, 2024 0

Use the moment-ratio diagram to visualize the sampling distribution of skewness and kurtosis

The moment-ratio diagram is a tool that is useful when choosing a distribution that models a sample of univariate data. As I show in my book (Simulating Data with SAS, Wicklin, 2013), you first plot the skewness and kurtosis of the sample on the moment-ratio diagram to see what common

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinApril 15, 2024 0

Distributions with specified skewness and kurtosis

A SAS programmer wanted to simulate samples from a family of Beta(a,b) distributions for a simulation study. (Recall that a Beta random variable is bounded with values in the range [0,1].) She wanted to choose the parameters such that the skewness and kurtosis of the distributions varied over range of

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinApril 8, 2024 0

Improve the Federal Reserve's dot plot

A dot plot is a standard statistical graphic that displays a statistic (often a mean) and the uncertainty of the statistic for one or more groups. Statisticians and data scientists use it in the analysis of group data. In late 2023, I started noticing headlines about "dot plots" in the

Read More

Data Visualization | Programming Tips

Rick WicklinApril 1, 2024 0

Add a second axis to a SAS graph

Recently, I saw a scatter plot that displayed the ticks, values, and labels for a vertical axis on the right side of a graph. In the SGPLOT procedure in SAS, you can use the Y2AXIS option to move an axis on the right side of a graph. Similarly, you can

Read More

Analytics | Learn SAS

Rick WicklinMarch 27, 2024 0

The likelihood ratio test for linear regression in SAS

A recent article describes how to estimate coefficients in a simple linear regression model by using maximum likelihood estimation (MLE). One of the nice properties of an MLE formulation is that you can compare a large model with a nested submodel in a natural way. For example, if you can

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinMarch 20, 2024 0

Maximum likelihood estimates for linear regression

A statistical analyst used the GENMOD procedure in SAS to fit a linear regression model. He noticed that the table of parameter estimates has an extra row (labeled "Scale") that is not a regression coefficient. The "scale parameter" is not part of the parameter estimates table produced by PROC REG

Read More

Learn SAS | Programming Tips

Rick WicklinMarch 11, 2024 0

Pizza pi

Happy Pi Day! Every year on March 14th (written 3/14 in the US), people in the mathematical sciences celebrate all things pi-related because 3.14 is the three-decimal approximation to π ≈ 3.14159265358979.... Pi is a mathematical constant defined as the ratio of a circle's circumference (C) to its diameter (D).

Read More

Learn SAS | Programming Tips

Rick WicklinMarch 6, 2024 0

A generalized Number-Word Game

I recently wrote about the Number-Word Game, which is an iterative algorithm that generates a sequence of natural numbers by using the lengths of the words for the numbers. In English, the words are "one", "two", "three", and so on. You can play the Number-Word Game in any alphabetic language

Read More

Learn SAS | Programming Tips

Rick WicklinMarch 4, 2024 0

The Number-Word Game

Have you heard about the Number-Word Game? This is a simple game that has the following rules: Start with any positive integer. Write down the English word for the integer. Count the number of letters in the word. This gives a new positive integer. Go to (2). Repeat until a

Read More

Data Visualization | Learn SAS

Rick WicklinFebruary 28, 2024 0

Using colors to visualize groups in a bar chart in SAS

I sometimes see analysts overuse colors in statistical graphics. My rule of thumb is that you do not need to use color to represent a variable that is already represented in a graph. For example, it is redundant to use a continuous color ramp to represent the lengths of bars

Read More

Analytics | Learn SAS

Rick WicklinFebruary 26, 2024 0

On using flexible distributions to fit data

With four parameters I can fit an elephant. With five I can make his trunk wiggle. — John von Neumann Ever since the dawn of statistics, researchers have searched for the Holy Grail of statistical modeling. Namely, a flexible distribution that can model any continuous univariate data. As the quote

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinFebruary 21, 2024 0

On using the range to estimate the variability of small samples

In statistical quality control, practitioners often estimate the variability of products that are being produced in a manufacturing plant. It is important to estimate the variability as soon as possible, which means trying to obtain an estimate from a small sample. Samples of size five or less are not uncommon

Read More

Learn SAS | Programming Tips

Rick WicklinFebruary 19, 2024 0

The linear distribution on an interval

In a recent Monte Carlo project, I needed to simulate numbers on an interval by using a continuous linear probability density function (PDF). An example is shown to the right. In this example, the linear density function is decreasing on the interval, but the function could also be constant or

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinFebruary 12, 2024 0

An exact formula for the sampling distribution of the correlation coefficient

I read a journal article in which a researcher used a formula for the probability density function (PDF) of the sample correlation coefficient. The formula was rather complicated, and presented with no citation, so I was curious to learn more. I found the distribution for the correlation coefficient in the

Read More

Learn SAS

Rick WicklinFebruary 7, 2024 0

The elliptical heart

Some hearts are famous. For example, there is the "Heart of Gold" (Neil Young), the "Heart of Glass" (Blondie), and the Heart of Darkness (Joseph Conrad). But have you heard of the "Heart of Ellipses"? No? Well, in 2023, Ted Conway published an amusingly titled article, "Total Ellipse of the

Read More

Analytics | Learn SAS

Rick WicklinFebruary 5, 2024 0

Peeling a convex hull

This article looks at a geometric method for estimating the center of a multivariate point cloud. The method is known as convex-hull peeling. In two-dimensions, you can perform convex-hull peeling in SAS 9 by using the CVEXHULL function in SAS IML software. For higher dimensions, you can use the CONVEXHULL

Read More

Learn SAS | Programming Tips

Rick WicklinJanuary 31, 2024 0

The name of the variable that contains the largest value in each row

A SAS programmer wanted to find the name of the variable for each row that contains the largest value. This task is useful for wide data sets in which each observation has several variables that are measured on the same scale. For example, each observation in the data might represent

Read More

Analytics

Rick WicklinJanuary 29, 2024 0

The geometry of Jacobi's method

A colleague remarked that my recent article about using Jacobi's iterative method for solving a linear system of equations "seems like magic." Specifically, it seems like magic that you can solve a certain class of linear systems by using only matrix multiplication. For any initial guess, the iteration converges to

Read More

Learn SAS | Programming Tips

Rick WicklinJanuary 24, 2024 0

Implement Jacobi's method in SAS

In a first course in numerical analysis, students often encounter a simple iterative method for solving a linear system of equations, known as Jacobi's method (or Jacobi's iterative method). Although Jacobi's method is not used much in practice, it is introduced because it is easy to explain, easy to implement,

Read More

Analytics | Learn SAS

Rick WicklinJanuary 22, 2024 0

Angles vs slopes: The statistics of steepness

There are two popular ways to express the steepness of a line or ray. The most-often used mathematical definition is from high-school math where the slope is defined as "rise over run." A second way is to report the angle of inclination to the horizontal, as introduced in basic trigonometry.

Read More

Learn SAS | Programming Tips

Rick WicklinJanuary 15, 2024 0

Simulate correlated continuous and discrete variables

Statistical software provides methods to simulate independent random variates from continuous and discrete distributions. For example, in the SAS DATA step, you can use the RAND function to simulate variates from continuous distributions (such as the normal or lognormal distributions) or from discrete distributions (such as the Bernoulli or Poisson).

Read More

Programming Tips

Rick WicklinJanuary 10, 2024 0

Blog posts from 2023 that deserve a second look

In a previous article, I presented some of the most popular blog posts from 2023. The popular articles tend to discuss elementary topics that have broad appeal. However, I also wrote many technical articles about advanced topics. The following articles didn't make the Top 10 list, but they deserve a

Read More

Analytics | Learn SAS

Rick WicklinJanuary 8, 2024 0

Reporting statistics for unobserved levels of categorical variables

An unobserved category is one that does not appear in a sample of data. For example, in a small sample of US voters, you are likely to observe members of the major political parties, but less likely to observe members of minor or fringe parties. This can cause a headache

Read More

Analytics | Data Visualization | Learn SAS | Programming Tips

Rick WicklinJanuary 3, 2024 0

Top 10 posts from The DO Loop in 2023

In 2023, I wrote 90 articles for The DO Loop blog. My most popular articles were about SAS programming, data visualization, and statistics. In addition, several "general interest" articles were popular, including my article for Pi Day and an article about AI chatbots. If you missed any of these articles,

Read More

Analytics | Learn SAS

Rick WicklinDecember 19, 2023 0

The difference between frequencies and weights in a correlation analysis

Statistical software often includes supports for a weight variable. Many SAS procedures make a distinction between integer frequencies and more general "importance weights." Frequencies are supported by using the FREQ statement in SAS procedures; general weights are supported by using the WEIGHT statement. An exception is PROC FREQ, which contains

Read More

Analytics | Learn SAS

Rick WicklinDecember 13, 2023 0

Estimate polychoric correlation by maximum likelihood estimation

SAS provides many built-in routines for data analysis. A previous article discusses polychoric correlation, which is a measure of association between two ordinal variables. In SAS, you can use PROC FREQ or PROC CORR to estimate the polychoric correlation, its standard error, and confidence intervals. Although SAS provides a built-in

Read More

Analytics | Learn SAS

Rick WicklinDecember 11, 2023 0

What is polychoric correlation?

Correlation is a statistic that measures the association between two variables. When two variables are positively correlated, low values of one variable tend to be associated with low values of the other variable. Medium values and high values are similarly associated. For negative correlation, the association is flipped: low values

Read More

Previous 1 2 3 4 5 6 … 53 Next