Blogs

Blogs

Tag: Data Analysis

Analytics | Data Visualization

Rick WicklinApril 22, 2020 0

Visualize the case fatality rate for COVID-19 in US counties

A previous article describes the funnel plot (Spiegelhalter, 2005), which can identify samples that have rates or proportions that are much different than expected. The funnel plot is a scatter plot that plots the sample proportion of some quantity against the size of the sample. The variance of the sample

Read More

Analytics | Data Visualization

Rick WicklinApril 20, 2020 0

Use a funnel plot to visualize rates: The case fatality rate for COVID-19 in North Carolina counties

Death is always a difficult topic to discuss, and death has been in the news a lot during this tragic coronavirus pandemic. Many news stories focus on states, counties, or cities that have the most cases or the most deaths. A related statistic is the case fatality rate, which is

Read More

Data Visualization | Learn SAS

Rick WicklinMarch 30, 2020 0

Smokestack plots: A visualization technique for comparing cumulative curves

A cumulative curve shows the total amount of some quantity at multiple points in time. Examples include: Total sales of songs, movies, or books, beginning when the item is released. Total views of blog posts, beginning when the post is published. Total cases of a disease for different countries, beginning

Read More

Advanced Analytics | Data Visualization | Programming Tips

Rick WicklinMarch 9, 2020 0

ROC curves for a binormal sample

In a previous article, I discussed the binormal model for a binary classification problem. This model assumes a set of scores that are normally distributed for each population, and the mean of the scores for the Negative population is less than the mean of scores for the Positive population. I

Read More

Data Visualization | Learn SAS

Rick WicklinMarch 2, 2020 0

Create a deviation plot to visualize values relative to a baseline

A colleague recently posted an article about how to use SAS Visual Analytics to create a circular graph that displays a year's worth of temperature data. Specifically, the graph shows the air temperature for each day in a year relative to some baseline temperature, such as 65F (18C). Days warmer

Read More

Analytics | Data Visualization

Rick WicklinFebruary 17, 2020 0

Visualize collinearity diagnostics

A previous article shows how to interpret the collinearity diagnostics that are produced by PROC REG in SAS. The process involves scanning down numbers in a table in order to find extreme values. This can be a tedious and error-prone process. Friendly and Kwan (2009) compare this task to a

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinFebruary 12, 2020 0

The Johnson system: Which distribution should you choose to model data?

The Johnson system (Johnson, 1949) contains a family of four distributions: the normal distribution, the lognormal distribution, the SB distribution, and the SU distribution. Previous articles explain why the Johnson system is useful and show how to use PROC UNIVARIATE in SAS to estimate parameters for the Johnson SB distribution

Read More

Programming Tips

Rick WicklinFebruary 3, 2020 0

What sample size do you need for a binomial test of proportions?

Recently someone on social media asked, "how can I compute the required sample size for a binomial test?" I assume from the question that the researcher was designing an experiment to test the proportions between two groups, such as a control group and a treatment/intervention group. They wanted to know

Read More

Analytics | Learn SAS

Rick WicklinJanuary 29, 2020 0

Collinearity diagnostics: Should the data be centered?

In a previous article, I showed how to perform collinearity diagnostics in SAS by using the COLLIN option in the MODEL statement in PROC REG. For models that contain an intercept term, I noted that there has been considerable debate about whether the data vectors should be mean-centered prior to

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinJanuary 27, 2020 0

The Johnson SU distribution

The Johnson system (Johnson, 1949) contains a family of four distributions: the normal distribution, the lognormal distribution, the SB distribution (which models bounded distributions), and the SU distribution (which models unbounded distributions). Note that 'B' stands for 'bounded' and 'U' stands for 'unbounded.' A previous article explains the purpose of

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinJanuary 20, 2020 0

The Johnson SB distribution

From the early days of probability and statistics, researchers have tried to organize and categorize parametric probability distributions. For example, Pearson (1895, 1901, and 1916) developed a system of seven distributions, which was later called the Pearson system. The main idea behind a "system" of distributions is that for each

Read More

Analytics | Data Visualization | Machine Learning

Rick WicklinJanuary 13, 2020 0

10 posts from 2019 that deserve a second look

Did you add "learn something new" to your list of New Year's resolutions? Last week, I wrote about the most popular articles from The DO Loop in 2019. The most popular articles are about elementary topics in SAS programming or univariate statistics because those topics have broad appeal. Advanced topics

Read More

Data Visualization | Learn SAS | Programming Tips

Rick WicklinJanuary 6, 2020 0

Top posts from The DO Loop in 2019

Last year, I wrote more than 100 posts for The DO Loop blog. The most popular articles were about SAS programming tips for data analysis, statistical analysis, and data visualization. Here are the most popular articles from 2019 in each category. SAS programming tips Create training, testing, and validation data

Read More

Analytics | Data Visualization

Rick WicklinDecember 18, 2019 0

Create a conditional quantile bin plot in SAS

A 2-D "bin plot" counts the number of observations in each cell in a regular 2-D grid. The 2-D bin plot is essentially a 2-D version of a histogram: it provides an estimate for the density of a 2-D distribution. As I discuss in the article, "The essential guide to

Read More

Analytics | Programming Tips

Rick WicklinDecember 11, 2019 0

Swap elements in binary matrices

Binary matrices are used for many purposes. I have previously written about how to use binary matrices to visualize missing values in a data matrix. They are also used to indicate the co-occurrence of two events. In ecology, binary matrices are used to indicate which species of an animal are

Read More

Analytics | Data Visualization

Rick WicklinDecember 5, 2019 0

Longitudinal data: The mixed model

This is a second article about analyzing longitudinal data, which features measurements that are repeatedly taken on subjects at several points in time. The previous article discusses a response-profile analysis, which uses an ANOVA method to determine differences between the means of an experimental group and a placebo group. The

Read More

Analytics

Rick WicklinDecember 3, 2019 0

Longitudinal data: The response-profile model

Longitudinal data are used in many health-related studies in which individuals are measured at multiple points in time to monitor changes in a response variable, such as weight, cholesterol, or blood pressure. There are many excellent articles and books that describe the advantages of a mixed model for analyzing longitudinal

Read More

Analytics | Learn SAS

Rick WicklinNovember 20, 2019 0

Predicted values in generalized linear models: The ILINK option in SAS

In a linear regression model, the predicted values are on the same scale as the response variable. You can plot the observed and predicted responses to visualize how well the model agrees with the data, However, for generalized linear models, there is a potential source of confusion. Recall that a

Read More

Analytics | Data Visualization

Rick WicklinNovember 13, 2019 0

Create biplots in SAS

Biplots are two-dimensional plots that help to visualize relationships in high dimensional data. A previous article discusses how to interpret biplots for continuous variables. The biplot projects observations and variables onto the span of the first two principal components. The observations are plotted as markers; the variables are plotted as

Read More

Learn SAS | Programming Tips

Rick WicklinNovember 11, 2019 0

Round to even

In grade school, students learn how to round numbers to the nearest integer. In later years, students learn variations, such as rounding up and rounding down by using the greatest integer function and least integer function, respectively. My sister, who is an engineer, learned a rounding method that rounds half-integers

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinNovember 6, 2019 0

What are biplots?

Principal component analysis (PCA) is an important tool for understanding relationships in continuous multivariate data. When the first two principal components (PCs) explain a significant portion of the variance in the data, you can visualize the data by projecting the observations onto the span of the first two PCs. In

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinNovember 4, 2019 0

How to interpret graphs in a principal component analysis

Understanding multivariate statistics requires mastery of high-dimensional geometry and concepts in linear algebra such as matrix factorizations, basis vectors, and linear subspaces. Graphs can help to summarize what a multivariate analysis is telling us about the data. This article looks at four graphs that are often part of a principal

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinOctober 21, 2019 0

Compute and visualize binomial proportions in SAS

Computing rates and proportions is a common task in data analysis. When you are computing several proportions, it is helpful to visualize how the rates vary among subgroups of the population. Examples of proportions that depend on subgroups include: Mortality rates for various types of cancers Incarceration rates by race

Read More

Analytics | Data Visualization

Rick WicklinOctober 16, 2019 0

Visualize a regression with splines

The EFFECT statement is supported by more than a dozen SAS/STAT regression procedures. Among other things, it enables you to generate spline effects that you can use to fit nonlinear relationships in data. Recently there was a discussion on the SAS Support Communities about how to interpret the parameter estimates

Read More

Analytics | Learn SAS

Rick WicklinOctober 14, 2019 0

Compute the geometric mean for many variables in SAS

I recently wrote about how to use PROC TTEST in SAS/STAT software to compute the geometric mean and related statistics. This prompted a SAS programmer to ask a related question. Suppose you have dozens (or hundreds) of variables and you want to compute the geometric mean of each. What is

Read More

Advanced Analytics | Machine Learning

Hans BondeOctober 10, 2019 0

Forecast accuracy matters

In a recent video blog, I discuss forecast accuracy as a parameter for measuring the ability to forecast and plan demand. I further argue for the use of causal data as a key input to understanding historical demand and forecasting/planning future demand. Forecast accuracy is often claimed NOT to be

Read More

Energy & Utilities | Manufacturing | Retail

Analytics | Data Visualization

Rick WicklinOctober 9, 2019 0

What statistic should you use to display error bars for a mean?

In a previous article, I mentioned that the VLINE statement in PROC SGPLOT is an easy way to graph the mean response at a set of discrete time points. I mentioned that you can choose three options for the length of the "error bars": the standard deviation of the data,

Read More

Analytics | Programming Tips

Rick WicklinOctober 2, 2019 0

Compute the geometric mean, geometric standard deviation, and geometric CV in SAS

I frequently see questions on SAS discussion forums about how to compute the geometric mean and related quantities in SAS. Unfortunately, the answers to these questions are sometimes confusing or even wrong. In addition, some published papers and web sites that claim to show how to calculate the geometric mean

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinSeptember 16, 2019 0

The Hull moving average: Implement a custom time series smoother in SAS

A moving average is a statistical technique that is used to smooth a time series. My colleague, Cindy Wang, wrote an article about the Hull moving average (HMA), which is a time series smoother that is sometimes used as a technical indicator by stock market traders. Cindy showed how to

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinSeptember 5, 2019 0

Use cosine similarity to make recommendations

When you order an item online, the website often recommends other items based on your purchase. In fact, these kinds of "recommendation engines" contributed to the early success of companies like Amazon and Netflix. SAS uses a recommender engine to suggest articles on the SAS Support Communities. Although recommender engines

Read More

Previous 1 … 3 4 5 6 7 … 17 Next