Blogs

Blogs

Author

Rick Wicklin

Rick Wicklin RSS
Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Data Visualization | Learn SAS

Rick WicklinMarch 2, 2020 0

Create a deviation plot to visualize values relative to a baseline

A colleague recently posted an article about how to use SAS Visual Analytics to create a circular graph that displays a year's worth of temperature data. Specifically, the graph shows the air temperature for each day in a year relative to some baseline temperature, such as 65F (18C). Days warmer

Read More

Analytics | Data Visualization

Rick WicklinFebruary 26, 2020 0

The binormal model for ROC curves

The ROC curve is a graphical method that summarizes how well a binary classifier can discriminate between two populations, often called the "negative" population (individuals who do not have a disease or characteristic) and the "positive" population (individuals who do have it). As shown in a previous article, there is

Read More

Programming Tips

Rick WicklinFebruary 24, 2020 0

Visualization of a binary classification analysis

The purpose of this article is to show how to use SAS to create a graph that illustrates a basic idea in a binary classification analysis, such as discriminant analysis and logistic regression. The graph, shown at right, shows two populations. Subjects in the "negative" population do not have some

Read More

Learn SAS | Programming Tips

Rick WicklinFebruary 19, 2020 0

A list of SAS DATA step functions that do not run in CAS

Are you a statistical programmer whose company has adopted SAS Viya? If so, you probably know that the DATA step can run in parallel in SAS Cloud Analytic Services (CAS). As Sekosky (2017) says, "running in a single thread in SAS is different from running in many threads in CAS."

Read More

Analytics | Data Visualization

Rick WicklinFebruary 17, 2020 0

Visualize collinearity diagnostics

A previous article shows how to interpret the collinearity diagnostics that are produced by PROC REG in SAS. The process involves scanning down numbers in a table in order to find extreme values. This can be a tedious and error-prone process. Friendly and Kwan (2009) compare this task to a

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinFebruary 12, 2020 0

The Johnson system: Which distribution should you choose to model data?

The Johnson system (Johnson, 1949) contains a family of four distributions: the normal distribution, the lognormal distribution, the SB distribution, and the SU distribution. Previous articles explain why the Johnson system is useful and show how to use PROC UNIVARIATE in SAS to estimate parameters for the Johnson SB distribution

Read More

Learn SAS | Programming Tips

Rick WicklinFebruary 10, 2020 0

Find the fractional part of a number

You can represent every number as a nearby integer plus a decimal. For example, 1.3 = 1 + 0.3. The integer is called the integer part of x, whereas the decimal is called the fractional part of x (or sometimes the decimal part of x). This representation is not unique.

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinFebruary 5, 2020 0

Visualize residual projections for linear regression

A SAS programmer wanted to create a graph that illustrates how Deming regression differs from ordinary least squares regression. The main idea is shown in the panel of graphs below. The first graph shows the geometry of least squares regression when we regress Y onto X. ("Regress Y onto X"

Read More

Programming Tips

Rick WicklinFebruary 3, 2020 0

What sample size do you need for a binomial test of proportions?

Recently someone on social media asked, "how can I compute the required sample size for a binomial test?" I assume from the question that the researcher was designing an experiment to test the proportions between two groups, such as a control group and a treatment/intervention group. They wanted to know

Read More

Analytics | Learn SAS

Rick WicklinJanuary 29, 2020 0

Collinearity diagnostics: Should the data be centered?

In a previous article, I showed how to perform collinearity diagnostics in SAS by using the COLLIN option in the MODEL statement in PROC REG. For models that contain an intercept term, I noted that there has been considerable debate about whether the data vectors should be mean-centered prior to

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinJanuary 27, 2020 0

The Johnson SU distribution

The Johnson system (Johnson, 1949) contains a family of four distributions: the normal distribution, the lognormal distribution, the SB distribution (which models bounded distributions), and the SU distribution (which models unbounded distributions). Note that 'B' stands for 'bounded' and 'U' stands for 'unbounded.' A previous article explains the purpose of

Read More

Analytics | Learn SAS

Rick WicklinJanuary 23, 2020 0

Collinearity in regression: The COLLIN option in PROC REG

I was recently asked about how to interpret the output from the COLLIN (or COLLINOINT) option on the MODEL statement in PROC REG in SAS. The example in the documentation for PROC REG is correct but is somewhat terse regarding how to use the output to diagnose collinearity and how

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinJanuary 20, 2020 0

The Johnson SB distribution

From the early days of probability and statistics, researchers have tried to organize and categorize parametric probability distributions. For example, Pearson (1895, 1901, and 1916) developed a system of seven distributions, which was later called the Pearson system. The main idea behind a "system" of distributions is that for each

Read More

Advanced Analytics | Data Visualization

simplified moment-ratio diagram in SAS

Rick WicklinJanuary 15, 2020 0

The moment-ratio diagram

In my book Simulating Data with SAS, I show how to use a graphical tool, called the moment-ratio diagram, to characterize and compare continuous probability distributions based on their skewness and kurtosis (Wicklin, 2013, Chapter 16). The idea behind the moment-ratio diagram is that skewness and kurtosis are essential for

Read More

Analytics | Data Visualization | Machine Learning

Rick WicklinJanuary 13, 2020 0

10 posts from 2019 that deserve a second look

Did you add "learn something new" to your list of New Year's resolutions? Last week, I wrote about the most popular articles from The DO Loop in 2019. The most popular articles are about elementary topics in SAS programming or univariate statistics because those topics have broad appeal. Advanced topics

Read More

Analytics | Learn SAS

Rick WicklinJanuary 8, 2020 0

3 ways to add confidence limits to regression curves in SAS

Many SAS procedures can automatically create a graph that overlays multiple prediction curves and their prediction limits. This graph (sometimes called a "fit plot" or a "sliced fit plot") is useful when you want to visualize a model in which a continuous response variable depends on one continuous explanatory variable

Read More

Data Visualization | Learn SAS | Programming Tips

Rick WicklinJanuary 6, 2020 0

Top posts from The DO Loop in 2019

Last year, I wrote more than 100 posts for The DO Loop blog. The most popular articles were about SAS programming tips for data analysis, statistical analysis, and data visualization. Here are the most popular articles from 2019 in each category. SAS programming tips Create training, testing, and validation data

Read More

Analytics | Data Visualization

Rick WicklinDecember 18, 2019 0

Create a conditional quantile bin plot in SAS

A 2-D "bin plot" counts the number of observations in each cell in a regular 2-D grid. The 2-D bin plot is essentially a 2-D version of a histogram: it provides an estimate for the density of a 2-D distribution. As I discuss in the article, "The essential guide to

Read More

Analytics

Rick WicklinDecember 16, 2019 0

Math-ing around the Christmas tree: Can the SVD de-noise an image?

Rockin' around the Christmas tree At the Christmas party hop. – Brenda Lee Last Christmas, I saw a fun blog post that used optimization methods to de-noise an image of a Christmas tree. Although there are specialized algorithms that remove random noise from an image, I am not going to

Read More

Analytics | Programming Tips

Rick WicklinDecember 11, 2019 0

Swap elements in binary matrices

Binary matrices are used for many purposes. I have previously written about how to use binary matrices to visualize missing values in a data matrix. They are also used to indicate the co-occurrence of two events. In ecology, binary matrices are used to indicate which species of an animal are

Read More

Data Visualization | Learn SAS

Rick WicklinDecember 9, 2019 0

Visualize data before and after a treatment

Recently I showed how to visualize and analyze longitudinal data in which subjects are measured at multiple time points. A very common situation is that the data are collected at two time points. For example, in medicine it is very common to measure some quantity (blood pressure, cholesterol, white-blood cell

Read More

Analytics | Data Visualization

Rick WicklinDecember 5, 2019 0

Longitudinal data: The mixed model

This is a second article about analyzing longitudinal data, which features measurements that are repeatedly taken on subjects at several points in time. The previous article discusses a response-profile analysis, which uses an ANOVA method to determine differences between the means of an experimental group and a placebo group. The

Read More

Analytics

Rick WicklinDecember 3, 2019 0

Longitudinal data: The response-profile model

Longitudinal data are used in many health-related studies in which individuals are measured at multiple points in time to monitor changes in a response variable, such as weight, cholesterol, or blood pressure. There are many excellent articles and books that describe the advantages of a mixed model for analyzing longitudinal

Read More

Analytics | Data Visualization

Rick WicklinNovember 27, 2019 0

Evaluate a function on a linear subspace

This article discusses how to restrict a multivariate function to a linear subspace. This is a useful technique in many situations, including visualizing an objective function that is constrained by linear equalities. For example, the graph to the right is from a previous article about how to evaluate quadratic polynomials.

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinNovember 25, 2019 0

Evaluate a quadratic polynomial in SAS

What is an efficient way to evaluate a multivariate quadratic polynomial in p variables? The answer is to use matrix computations! A multivariate quadratic polynomial can be written as the sum of a purely quadratic term (degree 2), a purely linear term (degree 1), and a constant term (degree 0).

Read More

Analytics | Learn SAS

Rick WicklinNovember 20, 2019 0

Predicted values in generalized linear models: The ILINK option in SAS

In a linear regression model, the predicted values are on the same scale as the response variable. You can plot the observed and predicted responses to visualize how well the model agrees with the data, However, for generalized linear models, there is a potential source of confusion. Recall that a

Read More

Data Visualization | Programming Tips

Rick WicklinNovember 18, 2019 0

Create a strip plot in SAS

My colleague, Mike Drutar, recently showed how to create a "strip plot" that shows the distribution of temperatures for each calendar month at a particular location. Mike created the strip plot in SAS Visual Analytics by using a point-and-click interface. This article shows how to create a similar graph by

Read More

Analytics | Data Visualization

Rick WicklinNovember 13, 2019 0

Create biplots in SAS

Biplots are two-dimensional plots that help to visualize relationships in high dimensional data. A previous article discusses how to interpret biplots for continuous variables. The biplot projects observations and variables onto the span of the first two principal components. The observations are plotted as markers; the variables are plotted as

Read More

Learn SAS | Programming Tips

Rick WicklinNovember 11, 2019 0

Round to even

In grade school, students learn how to round numbers to the nearest integer. In later years, students learn variations, such as rounding up and rounding down by using the greatest integer function and least integer function, respectively. My sister, who is an engineer, learned a rounding method that rounds half-integers

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinNovember 6, 2019 0

What are biplots?

Principal component analysis (PCA) is an important tool for understanding relationships in continuous multivariate data. When the first two principal components (PCs) explain a significant portion of the variance in the data, you can visualize the data by projecting the observations onto the span of the first two PCs. In

Read More

Previous 1 … 14 15 16 17 18 … 53 Next