Blogs

Blogs

Tag: Data Analysis

Analytics | Data Visualization | Programming Tips

Rick WicklinMarch 24, 2025 0

The quantile fit plot: Comparing empirical and predicted quantiles for a univariate model

A common task in statistics is to model data by using a parametric probability distribution, such as the normal, lognormal, beta, or gamma distributions. There are many ways to assess how well the model fits the data, including graphical methods such as a Q-Q plot and formal statistical tests such

Read More

Analytics | Learn SAS

Rick WicklinFebruary 24, 2025 0

Use the EFFECTPLOT statement to visualize binomial regression models in SAS

In a binomial regression model, the response variable is the proportion of successes for a given number of trials. In SAS regression procedures, you specify a binomial model by using the EVENTS/TRIALS syntax on the MODEL statement. Many analysts use the LOGISTIC or GENMOD procedures to fit binomial models. Visualizing

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinFebruary 17, 2025 0

Deviance residuals and the DEVIANCE function in SAS

Many people have an intuitive feel for residuals in least square models and know that the sum of squared residuals is a goodness-of-fit measure. Generalized linear regression models use a different but related idea, called deviance residuals. What are deviance residuals, and how can you compute them? Deviance residuals (and

Read More

Learn SAS | Programming Tips

Rick WicklinFebruary 10, 2025 0

Find inflection points for a function that is known only at discrete points

A previous article describes how to use SAS to find the inflection points of a 1-D function that you can evaluate at any point. The function must be given by a formula (or by an algorithm) because the root-finding algorithm needs to evaluate the function at arbitrary locations. However, sometimes

Read More

Analytics | Data Visualization | Learn SAS | Programming Tips

Rick WicklinJanuary 6, 2025 0

Top 10 posts from The DO Loop in 2024

In 2024, I wrote about 80 articles for The DO Loop blog. My most popular articles were about SAS programming, data visualization, and statistics. If you missed any of these articles, here is the "Reader's Choice Awards" for some of the most popular articles from 2024! SAS Programming The following

Read More

Analytics | Learn SAS

Rick WicklinNovember 18, 2024 0

The correlation between two sets of variables

In a correlation analysis, it is common to consider the correlations between all pairs of numerical variables. That is, if there are k numerical variables, most people examine the complete k x k matrix of correlations. This matrix is symmetric and has 1s on the diagonal, so more than half of the

Read More

Advanced Analytics | Learn SAS

Rick WicklinNovember 11, 2024 0

Introducing PROC SIMSYSTEM in SAS Viya

When the SAS Global Forum 2020 conference was cancelled by the global COVID-19 pandemic, I felt sorry for the customers and colleagues who had spent months preparing their presentations. One presentation I especially wanted to attend was by Bucky Ransdell and Randy Tobias: "Introducing PROC SIMSYSTEM for Systematic Nonnormal Simulation".

Read More

Analytics | Programming Tips

Rick WicklinOctober 21, 2024 0

The correlogram: Visualize correlations by fitting angles

A common way to visualize the sample correlations between many numeric variables is to display a heat map that shows the Pearson correlation for each pair of variables, as shown in the image to the right. The correlation is a number in the range [-1, 1], where -1 indicated perfect

Read More

Learn SAS | Programming Tips

Rick WicklinSeptember 30, 2024 0

Programming the formulas for an ANOVA in SAS

In practice, there is no need to remember textbook formulas for the ANOVA test because all modern statistical software will perform the test for you. In SAS, the ANOVA procedure is designed to handle balanced designs (the same number of observations in each group) whereas the GLM procedure can handle

Read More

Advanced Analytics | Artificial Intelligence

José Humberto López

José Humberto LópezSeptember 17, 2024 0

El futuro de la experiencia de cliente. Así es cómo la IA generativa está transformando esta área

En un mercado con clientes cada vez más exigentes, la experiencia del cliente se ha convertido en un factor decisivo. En este contexto, la IA generativa emerge como un aliado estratégico, transformando la gestión de la experiencia del cliente y fortaleciendo la relación con ellos. Desde conversaciones automatizadas hasta la

Read More

Learn SAS | Programming Tips

Rick WicklinSeptember 9, 2024 0

The location of ticks in statistical graphics

Modern software for statistical graphics automatically handles many details and graph defaults, such as the range of the axes and the placement of tick marks. In the days of yore, these details required tedious manual calculations. Think about what is required to place ticks on a scatter plot. On the

Read More

Learn SAS | Programming Tips

Rick WicklinSeptember 4, 2024 0

Is a value in a vector? Use the ELEMENT function

In SAS, DATA step programmers use the IN operator to determine whether a value is contained in a set of target values. Did you know that there is a similar functionality in the SAS IML language? The ELEMENT function in the SAS IML language is similar to the IN operator

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinJuly 15, 2024 0

Isotonic regression: An application of quadratic optimization

Isotonic regression (also called monotonic regression) is a type of regression model that assumes that the response variable is a monotonic function of the explanatory variable(s). The model can be nondecreasing or nonincreasing. Certain physical and biological processes can be analyzed by using an isotonic regression model. For example, a

Read More

Learn SAS | Programming Tips

Rick WicklinJune 24, 2024 0

Teaching an AI assistant to read and write SAS IML vectors

One of the most exciting features of SAS Viya Workbench is that the code editor includes a generative AI component called SAS Viya Copilot. This feature was announced and demonstrated at SAS Innovate 2024. With the Copilot, you can specify a text prompt that generates SAS code. For example, you

Read More

Analytics | Data Visualization

Rick WicklinJune 19, 2024 0

Scale a density curve to match a histogram

This article discusses how to scale a probability density curve so that it fits appropriately on a histogram, as shown in the graph to the right. By definition, a probability density curve is scaled so that the area under the curve equals 1. However, a histogram might show counts or

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinJune 17, 2024 0

A bootstrap confidence interval for an R-square statistic

A previous article discusses a formula for a confidence interval for R-square in a linear regression model (Olkin and Finn (1995) "Correlations redux", Psychological Bulletin) The formula is useful for large data sets, but should be used with caution for small samples. At the end of the previous article, I

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinJune 10, 2024 0

The distribution of the R-square statistic

A SAS analyst ran a linear regression model and obtained an R-square statistic for the fit. However, he wanted a confidence interval, so he posted a question to a discussion forum asking how to obtain a confidence interval for the R-square parameter. Someone suggested a formula from a textbook (Cohen,

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinJune 3, 2024 0

Visualize a multivariate regression model when using spline effects

A SAS analyst read my previous article about visualizing the predicted values for a regression model that uses spline effects. Because the original explanatory variable does not appear in the model, the analyst had several questions: How do you score the model on new data? The previous example has only

Read More

Learn SAS | Programming Tips

Rick WicklinMay 15, 2024 0

Rank, order, and sorting

A SAS programmer was trying to implement an algorithm in PROC IML in SAS based on some R code he had seen on the internet. The R code used the rank() and order() functions. This led the programmer to ask, "What is the different between the rank and the order?

Read More

Learn SAS | Programming Tips

Rick WicklinMay 8, 2024 0

Dice and the correctness of a simulation

At a recent conference in Las Vegas, a presenter simulated the sum of two dice and used it to simulate the game of craps. I write a lot of simulations, so I'd like to discuss two related topics: How to simulate the sum of two dice in SAS. This is

Read More

Data Visualization | Learn SAS | Programming Tips

Rick WicklinMay 6, 2024 0

Visualize patterns of missing values

Years ago, I wrote an article that showed how to visualize patterns of missing data. During a recent data visualization talk, I discussed the program, which used a small number of SAS IML statements. An audience member asked whether it is possible to construct the same visualization by using only

Read More

Analytics | Learn SAS

Rick WicklinMay 1, 2024 0

Estimate a proportion and a confidence interval in SAS

A SAS programmer wanted to estimate a proportion and a confidence interval (CI), but didn't know which SAS procedure to call. He knows a formula for the CI from an elementary statistics textbook. If x is the observed count of events in a random sample of size n, then the

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinApril 22, 2024 0

Use the moment-ratio diagram to visualize the sampling distribution of skewness and kurtosis

The moment-ratio diagram is a tool that is useful when choosing a distribution that models a sample of univariate data. As I show in my book (Simulating Data with SAS, Wicklin, 2013), you first plot the skewness and kurtosis of the sample on the moment-ratio diagram to see what common

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinApril 8, 2024 0

Improve the Federal Reserve's dot plot

A dot plot is a standard statistical graphic that displays a statistic (often a mean) and the uncertainty of the statistic for one or more groups. Statisticians and data scientists use it in the analysis of group data. In late 2023, I started noticing headlines about "dot plots" in the

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinMarch 20, 2024 0

Maximum likelihood estimates for linear regression

A statistical analyst used the GENMOD procedure in SAS to fit a linear regression model. He noticed that the table of parameter estimates has an extra row (labeled "Scale") that is not a regression coefficient. The "scale parameter" is not part of the parameter estimates table produced by PROC REG

Read More

Analytics | Learn SAS

Rick WicklinFebruary 26, 2024 0

On using flexible distributions to fit data

With four parameters I can fit an elephant. With five I can make his trunk wiggle. — John von Neumann Ever since the dawn of statistics, researchers have searched for the Holy Grail of statistical modeling. Namely, a flexible distribution that can model any continuous univariate data. As the quote

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinFebruary 21, 2024 0

On using the range to estimate the variability of small samples

In statistical quality control, practitioners often estimate the variability of products that are being produced in a manufacturing plant. It is important to estimate the variability as soon as possible, which means trying to obtain an estimate from a small sample. Samples of size five or less are not uncommon

Read More

Analytics | Learn SAS

Rick WicklinFebruary 5, 2024 0

Peeling a convex hull

This article looks at a geometric method for estimating the center of a multivariate point cloud. The method is known as convex-hull peeling. In two-dimensions, you can perform convex-hull peeling in SAS 9 by using the CVEXHULL function in SAS IML software. For higher dimensions, you can use the CONVEXHULL

Read More

Programming Tips

Rick WicklinJanuary 10, 2024 0

Blog posts from 2023 that deserve a second look

In a previous article, I presented some of the most popular blog posts from 2023. The popular articles tend to discuss elementary topics that have broad appeal. However, I also wrote many technical articles about advanced topics. The following articles didn't make the Top 10 list, but they deserve a

Read More

Analytics | Learn SAS

Rick WicklinJanuary 8, 2024 0

Reporting statistics for unobserved levels of categorical variables

An unobserved category is one that does not appear in a sample of data. For example, in a small sample of US voters, you are likely to observe members of the major political parties, but less likely to observe members of minor or fringe parties. This can cause a headache

Read More