Recently I read an excellent blog post by Paul von Hippel entitled "How many imputations do you need?". It is based on a paper (von Hippel, 2018), which provides more details. Suppose you are faced with data that has many missing values. One way to address the missing values is

## Tag: **Missing Data**

In a previous article, I showed how to use SAS to perform mean imputation. However, there are three problems with using mean-imputed variables in statistical analyses: Mean imputation reduces the variance of the imputed variables. Mean imputation shrinks standard errors, which invalidates most hypothesis tests and the calculation of confidence

Imputing missing data is the act of replacing missing data by nonmissing values. Mean imputation replaces missing data in a numerical variable by the mean value of the nonmissing values. This article shows how to perform mean imputation in SAS. It also presents three statistical drawbacks of mean imputation. How

Missing values present challenges for the statistical analyst and data scientist. Many modeling techniques (such as regression) exclude observations that contain missing values, which can reduce the sample size and reduce the power of a statistical analysis. Before you try to deal with missing values in an analysis (for example,

When simulating data or testing algorithms, it is useful to be able to generate patterns of missing data. This article shows how to generate random and systematic patterns of missing values. In other words, this article shows how to replace nonmissing data with missing data. Generate a random pattern of

You can visualize missing data. It sounds like an oxymoron, but it is true. How can you draw graphs of something that is missing? In a previous article, I showed how you can use PROC MI in SAS/STAT software to create a table that shows patterns of missing data in

Missing data can be informative. Sometimes missing values in one variable are related to missing values in another variable. Other times missing values in one variable are independent of missing values in other variables. As part of the exploratory phase of data analysis, you should investigate whether there are patterns

SAS procedures usually handle missing values automatically. Univariate procedures such as PROC MEANS automatically delete missing values when computing basic descriptive statistics. Many multivariate procedures such as PROC REG delete an entire observation if any variable in the analysis has a missing value. This is called listwise deletion or using

The other day I encountered a SAS Knowledge Base article that shows how to count the number of missing and nonmissing values for each variable in a data set. However, the code is a complicated macro that is difficult for a beginning SAS programmer to understand. (Well, it was hard

Missing values are a fact of life. Many statistical analyses, such as regression, exclude observations that contain missing values prior to forming matrix equations that are used in the analysis. This post shows how to find rows of a data matrix that contain missing values and how to remove those