Blogs

Blogs

Tag: Data Analysis

Analytics | Data Visualization | Learn SAS | Programming Tips

Rick WicklinJanuary 3, 2024 0

Top 10 posts from The DO Loop in 2023

In 2023, I wrote 90 articles for The DO Loop blog. My most popular articles were about SAS programming, data visualization, and statistics. In addition, several "general interest" articles were popular, including my article for Pi Day and an article about AI chatbots. If you missed any of these articles,

Read More

Analytics | Learn SAS

Rick WicklinDecember 19, 2023 0

The difference between frequencies and weights in a correlation analysis

Statistical software often includes supports for a weight variable. Many SAS procedures make a distinction between integer frequencies and more general "importance weights." Frequencies are supported by using the FREQ statement in SAS procedures; general weights are supported by using the WEIGHT statement. An exception is PROC FREQ, which contains

Read More

Analytics | Learn SAS

Rick WicklinDecember 13, 2023 0

Estimate polychoric correlation by maximum likelihood estimation

SAS provides many built-in routines for data analysis. A previous article discusses polychoric correlation, which is a measure of association between two ordinal variables. In SAS, you can use PROC FREQ or PROC CORR to estimate the polychoric correlation, its standard error, and confidence intervals. Although SAS provides a built-in

Read More

Analytics | Learn SAS

Rick WicklinDecember 11, 2023 0

What is polychoric correlation?

Correlation is a statistic that measures the association between two variables. When two variables are positively correlated, low values of one variable tend to be associated with low values of the other variable. Medium values and high values are similarly associated. For negative correlation, the association is flipped: low values

Read More

Learn SAS | Programming Tips

Rick WicklinOctober 4, 2023 0

Use design matrices to analyze subgroups in SAS IML

A previous article shows ways to perform efficient BY-group processing in the SAS IML language. BY-group processing is a SAS-ism for what other languages call group processing or subgroup processing. The main idea is that the data set contains several discrete variables such as sex, race, education level, and so

Read More

Analytics | Learn SAS

Rick WicklinOctober 2, 2023 0

On computing Kendall's tau statistic in SAS

One thing I have learned about rank-based statistics over the years is "Be careful of tied values!" On multiple occasions, I have been asked, "Why doesn't the SAS result for [NAME] statistic agree with my hand calculation?" The answer is sometimes because of the way that tied values are handled.

Read More

Learn SAS | Programming Tips

Rick WicklinSeptember 11, 2023 0

On the performance of BY-group processing in SAS IML

Many SAS procedures support a BY statement that enables you to perform an analysis for each unique value of a BY-group variable. The SAS IML language does not support a BY statement, but you can program a loop that iterates over all BY groups. You can emulate BY-group processing by

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinSeptember 6, 2023 0

Model data from published summary statistics

There are many ways to model a set of raw data by using a continuous probability distribution. It can be challenging, however, to choose the distribution that best models the data. Are the data normal? Lognormal? Is there a theoretical reason to prefer one distribution over another? The SAS has

Read More

Learn SAS | Programming Tips

Rick WicklinAugust 30, 2023 0

Simulate the use of personal checks in the US

Does anyone write paper checks anymore? According to researchers at the Federal Reserve Bank of Atlanta (Greene, et al., 2020), the use of paper checks has declined 63% among US consumers since the year 2000. The researchers surveyed more than 3,000 consumers in 2017-2018 and discovered that only 7% of

Read More

Analytics | Programming Tips

Rick WicklinJuly 24, 2023 0

Modifications of the Wilcoxon signed rank test and exact p-values

In a previous article, I discussed the Wilcoxon signed rank test, which is a nonparametric test for the location of the median. The Wikipedia article about the signed rank test mentions a variation of the test due to Pratt (1959). Whereas the standard Wilcoxon test excludes values that equal μ0

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinJuly 19, 2023 0

On the computation of the Wilcoxon signed rank statistic

Wilcoxon's signed rank test is a popular nonparametric alternative to a paired t test. In a paired t test, you analyze measurements for subjects before and after some treatment or intervention. You analyze the difference in the measurements for each subject, and test whether the mean difference is significantly different

Read More

Analytics | Learn SAS

Rick WicklinJuly 17, 2023 0

Standardize regression coefficients for models that include categorical variables

A previous article discusses standardized coefficients in linear regression models and shows how to compute standardized regression coefficients in SAS by using the STB option on the MODEL statement in PROC REG. It also discusses how to interpret a standardized regression coefficient. Recently, a SAS user wanted to know how

Read More

Analytics | Learn SAS

Rick WicklinJune 7, 2023 0

Visualize the Spearman rank correlation

A previous article explains the Spearman rank correlation, which is a robust cousin to the more familiar Pearson correlation. I've also discussed why you might want to use rank correlation, and how to interpret the strength of a rank correlation. This article gives a short example that helps you to

Read More

Learn SAS | Programming Tips

Rick WicklinMay 22, 2023 0

Rank character variables in SAS

SAS supports many ways to compute the rank of a numeric variable and to handle tied values. However, sometimes I need to rank the values in a character categorical variable. For example, the values {"Male", "Female", "Male"} have ranks {2, 1, 2} because, in alphabetical order, "Female" is the first-ranked

Read More

Advanced Analytics | Data Visualization | Programming Tips

Rick WicklinMay 17, 2023 0

Compute the silhouette statistic in SAS

A previous article defines the silhouette statistic (Rousseeuw, 1987) and shows how to use it to identify observations in a cluster analysis that are potentially misclassified. The article provides many graphs, including the silhouette plot, which is a bar chart or histogram that displays the distribution of the silhouette statistic

Read More

Analytics | Data Visualization

Rick WicklinMay 15, 2023 0

What is the silhouette statistic in cluster analysis?

Assigning observations into clusters can be challenging. One challenge is deciding how many clusters are in the data. Another is identifying which observations are potentially misclassified because they are on the boundary between two different clusters. Ralph Abbey's 2019 paper ("How to Evaluate Different Clustering Results") is a good way

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinApril 17, 2023 0

Should you use the Wald confidence interval for a binomial proportion?

The "Teacher’s Corner" of The American Statistician enables statisticians to discuss topics that are relevant to teaching and learning statistics. Sometimes, the articles have practical relevance, too. Andersson (2023) "The Wald Confidence Interval for a Binomial p as an Illuminating 'Bad' Example," is intended for professors and masters-level students in

Read More

Analytics

Rick WicklinApril 10, 2023 0

Means and medians of subgroups

A journal article listed the mean, median, and size for subgroups of the data, but did not report the overall mean or median. A SAS programmer wondered what, if any, inferences could be made about the overall mean and median for the data. The answer is that you can calculate

Read More

Analytics

Rick WicklinApril 5, 2023 0

Weak or strong? How to interpret a Spearman or Kendall correlation

A SAS user asked how to interpret a rank-based correlation such as a Spearman correlation or a Kendall correlation. These are alternative measures to the usual Pearson product-moment correlation, which is widely used. The programmer knew that words like "weak," "moderate," and "strong" are sometimes used to describe the Pearson

Read More

Analytics | Learn SAS

Rick WicklinApril 3, 2023 0

Why use rank correlation?

A previous article discusses rank correlation and lists some advantages of using rank correlation. However, the article does not show examples where an analyst might prefer to report the rank correlation instead of the traditional Pearson product-moment correlation. This article provides three examples where the rank correlation is a better

Read More

Analytics

Rick WicklinMarch 27, 2023 0

Simpson's paradox and confounding variables

A previous article discusses the issue of a confounding variable and uses correlation to give an example. The example shows that the correlation between two variables might be affected by a third variable, which is called a confounding variable. The article mentions that you can use the PARTIAL statement in

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinMarch 22, 2023 0

Partial correlation: controlling for confounding variables

A data analyst wanted to estimate the correlation between two variables, but he was concerned about the influence of a confounding variable that is correlated with them. The correlation might affect the apparent relationship between main two variables in the study. A common confounding variable is age because young people

Read More

Learn SAS | Programming Tips

Rick WicklinMarch 20, 2023 0

Estimate a Markov transition matrix from historical data

In a previous article about Markov transition matrices, I mentioned that you can estimate a Markov transition matrix by using historical data that are collected over a certain length of time. A SAS programmer asked how you can estimate a transition matrix in SAS. The answer is that you can

Read More

Analytics | Artificial Intelligence

Mark LambrechtMarch 15, 2023 0

6 predictions for AI and data in health care and life sciences

As in most other sectors, health care is changing at lightning speed. Access to data makes it possible to speed up clinical trials, develop more personalized medication, make quicker and better diagnoses, improve the quality of patient care and save lives. The pandemic has sped up digital transformation in every

Read More

Health Care | Life Sciences

Analytics | Learn SAS | Programming Tips

Rick WicklinMarch 15, 2023 0

Fitting a distribution to an expert's opinion: An application of the metalog distribution

Most homeowners know that large home improvement projects can take longer than you expect. Whether it's remodeling a kitchen, adding a deck, or landscaping a yard, big projects are expensive and subject to a lot of uncertainty. Factors such as weather, the availability of labor, and the supply of materials,

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinMarch 13, 2023 0

Use the metalog distribution in SAS

A previous article describes the metalog distribution (Keelin, 2016). The metalog distribution is a flexible family of distributions that can model a wide range of shapes for data distributions. The metalog system can model bounded, semibounded, and unbounded continuous distributions. This article shows how to use the metalog distribution in

Read More

Learn SAS | Programming Tips

Rick WicklinMarch 6, 2023 0

The variance of the sums of variables

Undergraduate textbooks on probability and statistics typically prove theorems that show how the variance of a sum of random variables is related to the variance of the original variables and the covariance between them. For example, the Wikipedia article on Variance contains an equation for the sum of two random

Read More

Analytics

Rick WicklinFebruary 22, 2023 0

What is the metalog distribution?

The metalog family of distributions (Keelin, Decision Analysis, 2016) is a flexible family that can model a wide range of continuous univariate data distributions when the data-generating mechanism is unknown. This article provides an overview of the metalog distributions. A subsequent article shows how to download and use a library

Read More

Learn SAS | Programming Tips

Rick WicklinNovember 28, 2022 0

Simulate poker hands in SAS

A SAS programmer was trying to simulate poker hands. He was having difficulty because the sampling scheme for simulating card games requires that you sample without replacement for each hand. In statistics, this is called "simple random sampling." If done properly, it is straightforward to simulate poker hands in SAS.

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinNovember 16, 2022 0

Optimal linear profile plots in SAS

A profile plot is a way to display multivariate values for many subjects. The optimal linear profile plot was introduced by John Hartigan in his book Clustering Algorithms (1975). In Michael Friendly's book (SAS System for Statistical Graphics, 1991), Friendly shows how to construct an optimal linear profile by using

Read More

Previous 1 2 3 4 … 17 Next