Blogs

Blogs

Tag: Data Analysis

Data Visualization | Learn SAS | Programming Tips

Rick WicklinNovember 14, 2022 0

Profile plots in SAS

A profile plot is a compact way to visualize many variables for a set of subjects. It enables you to investigate which subjects are similar to or different from other subjects. Visually, a profile plot can take many forms. This article shows several profile plots: a line plot of the

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinNovember 2, 2022 0

The area and perimeter of a convex hull

The area of a convex hull enables you to estimate the area of a compact region from a set of discrete observations. For example, a biologist might have multiple sightings of a wolf pack and want to use the convex hull to estimate the area of the wolves' territory. A

Read More

Learn SAS | Programming Tips

Rick WicklinSeptember 19, 2022 0

Generate random ID values for subjects in SAS

A common question on SAS discussion forums is how to use SAS to generate random ID values. The use case is to generate a set of random strings to assign to patients in a clinical study. If you assign each patient a unique ID and delete the patients' names, you

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinSeptember 7, 2022 0

A test for monotonic sequences and functions

Monotonic transformations occur frequently in math and statistics. Analysts use monotonic transformations to transform variable values, with Tukey's ladder of transformations and the Box-Cox transformations being familiar examples. Monotonic distributions figure prominently in probability theory because the cumulative distribution is a monotonic increasing function. For a continuous distribution that is

Read More

Analytics | Learn SAS

Rick WicklinAugust 22, 2022 0

The univariate Box-Cox transformation

A SAS customer asked how to use the Box-Cox transformation to normalize a single variable. Recall that a normalizing transformation is a function that attempts to convert a set of data to be as nearly normal as possible. For positive-valued data, introductory statistics courses often mention the log transformation or

Read More

Analytics | Learn SAS

Rick WicklinAugust 17, 2022 0

The Box-Cox transformation for a dependent variable in a regression

In the 1960s and '70s, before nonparametric regression methods became widely available, it was common to apply a nonlinear transformation to the dependent variable before fitting a linear regression model. This is still done today, with the most common transformation being a logarithmic transformation of the dependent variable, which fits

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinAugust 15, 2022 0

Tukey's ladder of variable transformations

John Tukey was an influential statistician who proposed many statistical concepts. In the 1960s and 70s, he was fundamental in the discovery and exposition of robust statistical methods, and he was an ardent proponent of exploratory data analysis (EDA). In his 1977 book, Exploratory Data Analysis, he discussed a small

Read More

Analytics | Learn SAS

Rick WicklinAugust 8, 2022 0

Means and medians as minimizers of a loss function

On Twitter, I saw a tweet from @DataSciFact that read, "The sum of (x_i - x)^2 over a set of data points x_i is minimized when x is the sample mean." I (@RickWicklin) immediately tweeted out a reply: "And the sum of |x_i - x| is minimized by the sample

Read More

Learn SAS | Programming Tips

Rick WicklinMay 16, 2022 0

How to unroll frequency data

In categorical data analysis, it is common to analyze tables of counts. For example, a researcher might gather data for 18 boys and 12 girls who apply for a summer enrichment program. The researcher might be interested in whether the proportion of boys that are admitted is different from the

Read More

Programming Tips

Rick WicklinMay 4, 2022 0

Bootstrap estimates for nonlinear regression models in SAS

In The Essential Guide to Bootstrapping in SAS, I note that there are many SAS procedures that support bootstrap estimates without requiring the analyst to write a program. I have previously written about using bootstrap options in the TTEST procedure. This article discusses the NLIN procedure, which can fit nonlinear

Read More

Analytics | Learn SAS

Rick WicklinApril 27, 2022 0

On Bartlett's sphericity test for correlation

When you have many correlated variables, principal component analysis (PCA) is a classical technique to reduce the dimensionality of the problem. The PCA finds a smaller dimensional linear subspace that explains most of the variability in the data. There are many statistical tools that help you decide how many principal

Read More

Data Visualization | Learn SAS | Programming Tips

Rick WicklinApril 20, 2022 0

Use a heat map to visualize an ordinal response in longitudinal data

Recently, I showed how to use a heat map to visualize measurements over time for a set of patients in a longitudinal study. The visualization is sometimes called a lasagna plot because it presents an alternative to the usual spaghetti plot. A reader asked whether a similar visualization can be

Read More

Analytics | Learn SAS

Rick WicklinApril 18, 2022 0

The McNemar test in SAS

What is McNemar's test? How do you run the McNemar test in SAS? Why might other statistical software report a value for McNemar's test that is different from the SAS value? SAS supports an exact version of the McNemar test, but when should you use it? This article answers these

Read More

Learn SAS | Programming Tips

Rick WicklinMarch 28, 2022 0

Use a heat map to visualize missing values in longitudinal data

Longitudinal data are measurements for a set of subjects at multiple points in time. Also called "panel data" or "repeated measures data," this kind of data is common in clinical trials in which patients are tracked over time. Recently, a SAS programmer asked how to visualize missing values in a

Read More

Analytics | Programming Tips

Rick WicklinFebruary 14, 2022 0

Passing-Bablok regression in SAS

This article implements Passing-Bablok regression in SAS. Passing-Bablok regression is a one-variable regression technique that is used to compare measurements from different instruments or medical devices. The measurements of the two variables (X and Y) are both measured with errors. Consequently, you cannot use ordinary linear regression, which assumes that

Read More

Learn SAS | Programming Tips

Rick WicklinJanuary 26, 2022 0

4 ways to find the k smallest and largest data values in SAS

Sometimes it is useful to know the extreme values in data. You might need to know the Top 5 or the Top 10 smallest data values. Or, the Top 5 or Top 10 largest data values. There are many ways to do this in SAS, but this article shows examples

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinJanuary 24, 2022 0

Estimate percentiles in SAS Viya

How can you estimate percentiles in SAS Viya? This article shows how to call the percentile action from PROC CAS to estimate percentiles of variables in a CAS data table. Percentiles and quantiles are essentially the same (the pth quantile is the 100*pth percentile for p in [0, 1]), so

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinJanuary 3, 2022 0

Top 10 posts from The DO Loop in 2021

Last year, I wrote almost 100 posts for The DO Loop blog. My most popular articles were about data visualization, statistics and data analysis, and simulation and bootstrapping. If you missed any of these gems when they were first published, here are some of the most popular articles from 2021:

Read More

Analytics | Learn SAS

Rick WicklinDecember 1, 2021 0

Beware of repeated values in loess models

Did you know that the loess regression algorithm is not well-defined when you have repeated values among the explanatory variables, and you request a very small smoothing parameter? This is because loess regression at the point x0 is based on using the k nearest neighbors to x0. If x0 has

Read More

Data Visualization | Learn SAS

Rick WicklinNovember 10, 2021 0

Create a frequency polygon in SAS

I was recently asked how to create a frequency polygon in SAS. A frequency polygon is an alternative to a histogram that shows similar information about the distribution of univariate data. It is the piecewise linear curve formed by connecting the midpoints of the tops of the bins. The graph

Read More

Analytics | Learn SAS

Rick WicklinNovember 1, 2021 0

Fit a mixture of Weibull distributions in SAS

A previous article discusses how to use SAS regression procedures to fit a two-parameter Weibull distribution in SAS. The article shows how to convert the regression output into the more familiar scale and shape parameters for the Weibull probability distribution, which are fit by using PROC UNIVARIATE. Although PROC UNIVARIATE

Read More

Analytics | Learn SAS

Rick WicklinOctober 27, 2021 0

Interpret estimates for a Weibull regression model in SAS

It can be frustrating when the same probability distribution has two different parameterizations, but such is the life of a statistical programmer. I previously wrote an article about the gamma distribution, which has two common parameterizations: one that uses a scale parameter (β) and another that uses a rate parameter

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinAugust 23, 2021 0

Sliced survival graphs in SAS

This article shows how to create a "sliced survival plot" for proportional-hazards models that are created by using PROC PHREG in SAS. Graphing the result of a statistical regression model is a valuable way to communicate the predictions of the model. Many SAS procedures use ODS graphics to produce graphs

Read More

Data Visualization | Learn SAS

Rick WicklinAugust 18, 2021 0

A comparison of different weighting schemes for ranking sports teams

A previous article discusses the geometry of weighted averages and shows how choosing different weights can lead to different rankings of the subjects. As an example, I showed how college programs might rank applicants by using a weighted average of factors such as test scores. "The best" applicant is determined

Read More

Analytics

Rick WicklinAugust 16, 2021 0

Rankings and the geometry of weighted averages

People love rankings. You've probably seen articles about the best places to live, the best colleges to attend, the best pizza to order, and so on. Each of these is an example of a ranking that is based on multiple characteristics. For example, a list of the best places to

Read More

Analytics | Data Visualization

Harry SnartJuly 19, 2021 0

Understanding your data: A series on the importance of Exploratory Data Analysis

Following on from my introductory blog series, Data Science in the Wild, we’re going to start delving into how you can scale up and industrialise your Analytics with SAS Viya. In future blogs we will look at how you can augment your R & Python code to leverage SAS Viya

Read More

Analytics | Programming Tips

Rick WicklinJune 30, 2021 0

Compute 2-D cumulative sums and ogives

A recent article about how to estimate a two-dimensional distribution function in SAS inspired me to think about a related computation: a 2-D cumulative sum. Suppose you have numbers in a matrix, X. A 2-D cumulative sum is a second matrix, C, such that the C[p,q] gives the sum of

Read More

Analytics | Learn SAS

Rick WicklinJune 9, 2021 0

Rank-based scores and tied values

Many nonparametric statistical methods use the ranks of observations to compute distribution-free statistics. In SAS, two procedures that use ranks are PROC NPAR1WAY and PROC CORR. Whereas the SPEARMAN option in PROC CORR (which computes rank correlation) uses only the "raw" tied ranks, PROC NPAR1WAY uses transformations of the ranks,

Read More

Analytics | Programming Tips

Rick WicklinJune 7, 2021 0

Permutation tests and independent sorting of data

For many univariate statistics (mean, median, standard deviation, etc.), the order of the data is unimportant. If you sort univariate data, the mean and standard deviation do not change. However, you cannot sort an individual variable (independently) if you want to preserve its relationship with other variables. This statement is

Read More

Analytics | Programming Tips

Rick WicklinJune 1, 2021 0

The Hampel identifier: Robust outlier detection in a time series

It is well known that classical estimates of location and scale (for example, the mean and standard deviation) are influenced by outliers. In the 1960s, '70s, and '80s, researchers such as Tukey, Huber, Hampel, and Rousseeuw advocated analyzing data by using robust statistical estimates such as the median and the

Read More

Previous 1 2 3 4 5 … 17 Next