Blogs

Blogs

Tag: Data Analysis

Learn SAS | Programming Tips

Rick WicklinMarch 20, 2023 0

Estimate a Markov transition matrix from historical data

In a previous article about Markov transition matrices, I mentioned that you can estimate a Markov transition matrix by using historical data that are collected over a certain length of time. A SAS programmer asked how you can estimate a transition matrix in SAS. The answer is that you can

Read More

Analytics | Artificial Intelligence

Mark LambrechtMarch 15, 2023 0

6 predictions for AI and data in health care and life sciences

As in most other sectors, health care is changing at lightning speed. Access to data makes it possible to speed up clinical trials, develop more personalized medication, make quicker and better diagnoses, improve the quality of patient care and save lives. The pandemic has sped up digital transformation in every

Read More

Health Care | Life Sciences

Analytics | Learn SAS | Programming Tips

Rick WicklinMarch 15, 2023 0

Fitting a distribution to an expert's opinion: An application of the metalog distribution

Most homeowners know that large home improvement projects can take longer than you expect. Whether it's remodeling a kitchen, adding a deck, or landscaping a yard, big projects are expensive and subject to a lot of uncertainty. Factors such as weather, the availability of labor, and the supply of materials,

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinMarch 13, 2023 0

Use the metalog distribution in SAS

A previous article describes the metalog distribution (Keelin, 2016). The metalog distribution is a flexible family of distributions that can model a wide range of shapes for data distributions. The metalog system can model bounded, semibounded, and unbounded continuous distributions. This article shows how to use the metalog distribution in

Read More

Learn SAS | Programming Tips

Rick WicklinMarch 6, 2023 0

The variance of the sums of variables

Undergraduate textbooks on probability and statistics typically prove theorems that show how the variance of a sum of random variables is related to the variance of the original variables and the covariance between them. For example, the Wikipedia article on Variance contains an equation for the sum of two random

Read More

Analytics

Rick WicklinFebruary 22, 2023 0

What is the metalog distribution?

The metalog family of distributions (Keelin, Decision Analysis, 2016) is a flexible family that can model a wide range of continuous univariate data distributions when the data-generating mechanism is unknown. This article provides an overview of the metalog distributions. A subsequent article shows how to download and use a library

Read More

Learn SAS | Programming Tips

Rick WicklinNovember 28, 2022 0

Simulate poker hands in SAS

A SAS programmer was trying to simulate poker hands. He was having difficulty because the sampling scheme for simulating card games requires that you sample without replacement for each hand. In statistics, this is called "simple random sampling." If done properly, it is straightforward to simulate poker hands in SAS.

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinNovember 16, 2022 0

Optimal linear profile plots in SAS

A profile plot is a way to display multivariate values for many subjects. The optimal linear profile plot was introduced by John Hartigan in his book Clustering Algorithms (1975). In Michael Friendly's book (SAS System for Statistical Graphics, 1991), Friendly shows how to construct an optimal linear profile by using

Read More

Data Visualization | Learn SAS | Programming Tips

Rick WicklinNovember 14, 2022 0

Profile plots in SAS

A profile plot is a compact way to visualize many variables for a set of subjects. It enables you to investigate which subjects are similar to or different from other subjects. Visually, a profile plot can take many forms. This article shows several profile plots: a line plot of the

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinNovember 2, 2022 0

The area and perimeter of a convex hull

The area of a convex hull enables you to estimate the area of a compact region from a set of discrete observations. For example, a biologist might have multiple sightings of a wolf pack and want to use the convex hull to estimate the area of the wolves' territory. A

Read More

Learn SAS | Programming Tips

Rick WicklinSeptember 19, 2022 0

Generate random ID values for subjects in SAS

A common question on SAS discussion forums is how to use SAS to generate random ID values. The use case is to generate a set of random strings to assign to patients in a clinical study. If you assign each patient a unique ID and delete the patients' names, you

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinSeptember 7, 2022 0

A test for monotonic sequences and functions

Monotonic transformations occur frequently in math and statistics. Analysts use monotonic transformations to transform variable values, with Tukey's ladder of transformations and the Box-Cox transformations being familiar examples. Monotonic distributions figure prominently in probability theory because the cumulative distribution is a monotonic increasing function. For a continuous distribution that is

Read More

Analytics | Learn SAS

Rick WicklinAugust 22, 2022 0

The univariate Box-Cox transformation

A SAS customer asked how to use the Box-Cox transformation to normalize a single variable. Recall that a normalizing transformation is a function that attempts to convert a set of data to be as nearly normal as possible. For positive-valued data, introductory statistics courses often mention the log transformation or

Read More

Analytics | Learn SAS

Rick WicklinAugust 17, 2022 0

The Box-Cox transformation for a dependent variable in a regression

In the 1960s and '70s, before nonparametric regression methods became widely available, it was common to apply a nonlinear transformation to the dependent variable before fitting a linear regression model. This is still done today, with the most common transformation being a logarithmic transformation of the dependent variable, which fits

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinAugust 15, 2022 0

Tukey's ladder of variable transformations

John Tukey was an influential statistician who proposed many statistical concepts. In the 1960s and 70s, he was fundamental in the discovery and exposition of robust statistical methods, and he was an ardent proponent of exploratory data analysis (EDA). In his 1977 book, Exploratory Data Analysis, he discussed a small

Read More

Analytics | Learn SAS

Rick WicklinAugust 8, 2022 0

Means and medians as minimizers of a loss function

On Twitter, I saw a tweet from @DataSciFact that read, "The sum of (x_i - x)^2 over a set of data points x_i is minimized when x is the sample mean." I (@RickWicklin) immediately tweeted out a reply: "And the sum of |x_i - x| is minimized by the sample

Read More

Learn SAS | Programming Tips

Rick WicklinMay 16, 2022 0

How to unroll frequency data

In categorical data analysis, it is common to analyze tables of counts. For example, a researcher might gather data for 18 boys and 12 girls who apply for a summer enrichment program. The researcher might be interested in whether the proportion of boys that are admitted is different from the

Read More

Programming Tips

Rick WicklinMay 4, 2022 0

Bootstrap estimates for nonlinear regression models in SAS

In The Essential Guide to Bootstrapping in SAS, I note that there are many SAS procedures that support bootstrap estimates without requiring the analyst to write a program. I have previously written about using bootstrap options in the TTEST procedure. This article discusses the NLIN procedure, which can fit nonlinear

Read More

Analytics | Learn SAS

Rick WicklinApril 27, 2022 0

On Bartlett's sphericity test for correlation

When you have many correlated variables, principal component analysis (PCA) is a classical technique to reduce the dimensionality of the problem. The PCA finds a smaller dimensional linear subspace that explains most of the variability in the data. There are many statistical tools that help you decide how many principal

Read More

Data Visualization | Learn SAS | Programming Tips

Rick WicklinApril 20, 2022 0

Use a heat map to visualize an ordinal response in longitudinal data

Recently, I showed how to use a heat map to visualize measurements over time for a set of patients in a longitudinal study. The visualization is sometimes called a lasagna plot because it presents an alternative to the usual spaghetti plot. A reader asked whether a similar visualization can be

Read More

Analytics | Learn SAS

Rick WicklinApril 18, 2022 0

The McNemar test in SAS

What is McNemar's test? How do you run the McNemar test in SAS? Why might other statistical software report a value for McNemar's test that is different from the SAS value? SAS supports an exact version of the McNemar test, but when should you use it? This article answers these

Read More

Learn SAS | Programming Tips

Rick WicklinMarch 28, 2022 0

Use a heat map to visualize missing values in longitudinal data

Longitudinal data are measurements for a set of subjects at multiple points in time. Also called "panel data" or "repeated measures data," this kind of data is common in clinical trials in which patients are tracked over time. Recently, a SAS programmer asked how to visualize missing values in a

Read More

Analytics | Programming Tips

Rick WicklinFebruary 14, 2022 0

Passing-Bablok regression in SAS

This article implements Passing-Bablok regression in SAS. Passing-Bablok regression is a one-variable regression technique that is used to compare measurements from different instruments or medical devices. The measurements of the two variables (X and Y) are both measured with errors. Consequently, you cannot use ordinary linear regression, which assumes that

Read More

Learn SAS | Programming Tips

Rick WicklinJanuary 26, 2022 0

4 ways to find the k smallest and largest data values in SAS

Sometimes it is useful to know the extreme values in data. You might need to know the Top 5 or the Top 10 smallest data values. Or, the Top 5 or Top 10 largest data values. There are many ways to do this in SAS, but this article shows examples

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinJanuary 24, 2022 0

Estimate percentiles in SAS Viya

How can you estimate percentiles in SAS Viya? This article shows how to call the percentile action from PROC CAS to estimate percentiles of variables in a CAS data table. Percentiles and quantiles are essentially the same (the pth quantile is the 100*pth percentile for p in [0, 1]), so

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinJanuary 3, 2022 0

Top 10 posts from The DO Loop in 2021

Last year, I wrote almost 100 posts for The DO Loop blog. My most popular articles were about data visualization, statistics and data analysis, and simulation and bootstrapping. If you missed any of these gems when they were first published, here are some of the most popular articles from 2021:

Read More

Analytics | Learn SAS

Rick WicklinDecember 1, 2021 0

Beware of repeated values in loess models

Did you know that the loess regression algorithm is not well-defined when you have repeated values among the explanatory variables, and you request a very small smoothing parameter? This is because loess regression at the point x0 is based on using the k nearest neighbors to x0. If x0 has

Read More

Data Visualization | Learn SAS

Rick WicklinNovember 10, 2021 0

Create a frequency polygon in SAS

I was recently asked how to create a frequency polygon in SAS. A frequency polygon is an alternative to a histogram that shows similar information about the distribution of univariate data. It is the piecewise linear curve formed by connecting the midpoints of the tops of the bins. The graph

Read More

Analytics | Learn SAS

Rick WicklinNovember 1, 2021 0

Fit a mixture of Weibull distributions in SAS

A previous article discusses how to use SAS regression procedures to fit a two-parameter Weibull distribution in SAS. The article shows how to convert the regression output into the more familiar scale and shape parameters for the Weibull probability distribution, which are fit by using PROC UNIVARIATE. Although PROC UNIVARIATE

Read More

Analytics | Learn SAS

Rick WicklinOctober 27, 2021 0

Interpret estimates for a Weibull regression model in SAS

It can be frustrating when the same probability distribution has two different parameterizations, but such is the life of a statistical programmer. I previously wrote an article about the gamma distribution, which has two common parameterizations: one that uses a scale parameter (β) and another that uses a rate parameter

Read More

Previous 1 2 3 4 … 16 Next