Blogs

Blogs

Author

Rick Wicklin

Rick Wicklin RSS
Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Analytics | Learn SAS | Programming Tips

Rick WicklinOctober 7, 2024 0

The three-sigma rule

A remarkable result in probability theory is the "three-sigma rule," which is a generic name for theorems that bound the probability that a univariate random variable will appear near the center of its distribution. This article discusses the familiar three-sigma rule for the normal distribution, a less-familiar rule for unimodal

Read More

Learn SAS | Programming Tips

Rick WicklinSeptember 30, 2024 0

Programming the formulas for an ANOVA in SAS

In practice, there is no need to remember textbook formulas for the ANOVA test because all modern statistical software will perform the test for you. In SAS, the ANOVA procedure is designed to handle balanced designs (the same number of observations in each group) whereas the GLM procedure can handle

Read More

Learn SAS | Programming Tips

Rick WicklinSeptember 23, 2024 0

Special missing values in SAS statistical tables

A previous article about how to display missing values in SAS prompted a comment about special missing values in ODS tables in SAS. Did you know that statistical tables in SAS include special missing values to represent certain situations in statistical analyses? This article explains how to interpret four special

Read More

Learn SAS | Programming Tips

Rick WicklinSeptember 16, 2024 0

Two ways to specify how SAS displays missing values

In statistical tables in SAS, a dot (.) represents a numerical missing value. Although a dot is the default symbol in SAS, other languages use other symbols. The R language prints the symbol NA, which stands for "not available." The MATLAB language uses NaN ("Not a Number"). In Python, many

Read More

Learn SAS | Programming Tips

Rick WicklinSeptember 9, 2024 0

The location of ticks in statistical graphics

Modern software for statistical graphics automatically handles many details and graph defaults, such as the range of the axes and the placement of tick marks. In the days of yore, these details required tedious manual calculations. Think about what is required to place ticks on a scatter plot. On the

Read More

Learn SAS | Programming Tips

Rick WicklinSeptember 4, 2024 0

Is a value in a vector? Use the ELEMENT function

In SAS, DATA step programmers use the IN operator to determine whether a value is contained in a set of target values. Did you know that there is a similar functionality in the SAS IML language? The ELEMENT function in the SAS IML language is similar to the IN operator

Read More

Learn SAS | Programming Tips

Rick WicklinAugust 28, 2024 0

Efficient recursion: Store values that will be reused

A previous article shows how to implement recursive formulas in SAS. The article points out that you can often avoid recursion by using an iterative algorithm, which is more efficient. An example is the Fibonacci sequence, which is usually defined recursively as F(n) = F(n-1) + F(n-2) for n

Read More

Learn SAS | Programming Tips

Rick WicklinAugust 26, 2024 0

An exact formula for the probability distribution for the sum of n dice

Many well-known distributions become more and more "normal looking" for large values of a parameter. Famously, the binomial distribution, Binom(p, N), can be approximated by a normal distribution when N (the sample size) is large. Similarly, the Poisson(λ) distribution is well approximated by the normal distribution when λ is large.

Read More

Learn SAS | Programming Tips

Rick WicklinAugust 21, 2024 0

How to write a SAS macro to emulate recursion (and why you shouldn't)

There are two programming tools that I rarely use: the SAS macro language and recursion. The SAS macro language is a tool that enables you to generate SAS statements. I rarely use the SAS macro language because the SAS IML language supports all the functionality required to write complex programs,

Read More

Learn SAS | Programming Tips

Rick WicklinAugust 19, 2024 0

How to define a SAS IML function that has no arguments

The SAS IML Language has a quirk with regards to functions that take no arguments. As discussed in the documentation, "modules with arguments are given a local symbol table." This is the usual behavior that programmers expect. However, the documentation goes on to state that "a module that has no

Read More

Learn SAS | Programming Tips

Rick WicklinAugust 12, 2024 0

Implement five sampling methods in the SAS DATA step

In SAS, the easiest way to draw random sampling from data is to use PROC SURVEYSELECT or the SAMPLE function in SAS IML software. I have previously written about how to implement four common sampling schemes by using PROC SURVEYSELECT and the SAMPLE function. The DATA step in SAS is

Read More

Analytics | Learn SAS

Rick WicklinAugust 7, 2024 0

Simulate data from a Poisson regression model

This article shows how to simulate data from a Poisson regression model, including how to account for an offset variable. If you are not familiar with how to run a Poisson regression in SAS, see the article "Poisson regression in SAS." A Poisson regression model is a specific type of

Read More

Analytics | Learn SAS

Rick WicklinAugust 5, 2024 0

Poisson regression in SAS

This article demonstrates how to use PROC GENMOD to perform a Poisson regression in SAS. There are different examples in the SAS documentation and in conference papers, but I chose this example because it uses two categorical explanatory variables. Therefore, the Poisson regression can be visualized by using a contingency

Read More

Analytics | Machine Learning

Rick WicklinJuly 31, 2024 0

Fit, simulate, fit: How models can collapse after generations of recursive fitting

An article published in Nature has the intriguing title, "AI models collapse when trained on recursively generated data." (Shumailov, et al., 2024). The article is quite readable, but I also recommend a less technical overview of the result: "AI models fed AI-generated data quickly spew nonsense" (Gibney, 2024). The Gibney

Read More

Analytics | Programming Tips

Rick WicklinJuly 29, 2024 0

A geometric solution to isotonic regression

A previous article shows that you can run a simple (one-variable) isotonic regression by using a quadratic programming (QP) formulation. While I was reading a book about computational geometry, I learned that there is a connection between isotonic regression and the convex hull of a certain set of points. Whaaaaat?

Read More

Learn SAS | Programming Tips

Rick WicklinJuly 24, 2024 0

QPSOLVE: A new SAS IML function for quadratic optimization

Since the pandemic began in 2020, the SAS IML developers have added about 50 new functions and enhancements to the SAS IML language in SAS Viya. Among these functions are new modern methods for optimization that have a simplified syntax as compared to the older 'NLP' functions that are available

Read More

Learn SAS | Programming Tips

Rick WicklinJuly 22, 2024 0

How to use keyword-value pairs when calling SAS IML subroutines

Just like the SAS DATA step, the SAS IML language supports both functions and subroutines. A function returns a value, so the calling syntax is familiar: y = func(x1, x2); /* the function returns one value, y */ In this syntax, the input arguments are x1 and x2. The

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinJuly 15, 2024 0

Isotonic regression: An application of quadratic optimization

Isotonic regression (also called monotonic regression) is a type of regression model that assumes that the response variable is a monotonic function of the explanatory variable(s). The model can be nondecreasing or nonincreasing. Certain physical and biological processes can be analyzed by using an isotonic regression model. For example, a

Read More

Learn SAS | Programming Tips

Rick WicklinJuly 10, 2024 0

Display the largest values for each group

A previous article discusses the fact that there are often multiple ways in SAS to obtain the same result. This fact results in many vigorous discussions on online programming forums as people propose different (but equivalent) methods for solving someone's problem then argue why their preferred method is better than

Read More

Learn SAS | Machine Learning

Rick WicklinJuly 8, 2024 0

On the reproducibility of responses by AI assistants

As announced and demonstrated at SAS Innovate 2024, SAS plans to include a generative AI assistant called SAS Viya Copilot in the forthcoming SAS Viya Workbench. You can submit a text prompt (by putting it in a comment string) and the Copilot will generate SAS code for you. My colleagues

Read More

Learn SAS | Programming Tips

Rick WicklinJuly 1, 2024 0

Create an interpolating polynomial in SAS

While reviewing a book on numerical analysis, I was reminded of a classic interpolation problem. Suppose you have n pairs of points in the plane: (x1,y1), (x2,y2), ..., (xn,yn), where the first coordinates are distinct. Then you can construct a unique polynomial of degree (at most) n-1 that passes through

Read More

Learn SAS | Programming Tips

Rick WicklinJune 24, 2024 0

Teaching an AI assistant to read and write SAS IML vectors

One of the most exciting features of SAS Viya Workbench is that the code editor includes a generative AI component called SAS Viya Copilot. This feature was announced and demonstrated at SAS Innovate 2024. With the Copilot, you can specify a text prompt that generates SAS code. For example, you

Read More

Analytics | Data Visualization

Rick WicklinJune 19, 2024 0

Scale a density curve to match a histogram

This article discusses how to scale a probability density curve so that it fits appropriately on a histogram, as shown in the graph to the right. By definition, a probability density curve is scaled so that the area under the curve equals 1. However, a histogram might show counts or

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinJune 17, 2024 0

A bootstrap confidence interval for an R-square statistic

A previous article discusses a formula for a confidence interval for R-square in a linear regression model (Olkin and Finn (1995) "Correlations redux", Psychological Bulletin) The formula is useful for large data sets, but should be used with caution for small samples. At the end of the previous article, I

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinJune 10, 2024 0

The distribution of the R-square statistic

A SAS analyst ran a linear regression model and obtained an R-square statistic for the fit. However, he wanted a confidence interval, so he posted a question to a discussion forum asking how to obtain a confidence interval for the R-square parameter. Someone suggested a formula from a textbook (Cohen,

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinJune 3, 2024 0

Visualize a multivariate regression model when using spline effects

A SAS analyst read my previous article about visualizing the predicted values for a regression model that uses spline effects. Because the original explanatory variable does not appear in the model, the analyst had several questions: How do you score the model on new data? The previous example has only

Read More

Learn SAS | Programming Tips

Rick WicklinMay 29, 2024 0

Find the label of a variable in SAS

Sometimes labels for variables get "dropped" during data preparation and cleaning. One example is when data are transposed from "wide form" to "long form." For example, suppose a data set has three variables, X, Y, and Z, each with labels. If you transpose the data to long form, the new

Read More

Data Visualization | Programming Tips

Rick WicklinMay 22, 2024 0

Create filled density plots in SAS

A SAS programmer wanted to visualize density estimate for some univariate data. The data had several groups, so he wanted to create a panel of density estimate, which you can easily do by using PROC SGPANEL in SAS. However, the programmer's boss wanted to see filled density estimates, such as

Read More

Analytics | Learn SAS

Rick WicklinMay 20, 2024 0

On the correctness of a discrete simulation

After writing a program that simulates data, it is important to check that the statistical properties of the simulated (synthetic) data match the properties of the model. As a first step, you can generate a large random sample from the model distribution and compare the sample statistics to the expected

Read More

Learn SAS | Programming Tips

Rick WicklinMay 15, 2024 0

Rank, order, and sorting

A SAS programmer was trying to implement an algorithm in PROC IML in SAS based on some R code he had seen on the internet. The R code used the rank() and order() functions. This led the programmer to ask, "What is the different between the rank and the order?

Read More

Previous 1 2 3 4 … 53 Next