Recently I read an excellent blog post by Paul von Hippel entitled "How many imputations do you need?". It is based on a paper (von Hippel, 2018), which provides more details.
Suppose you are faced with data that has many missing values. One way to address the missing values is to use multiple imputations. If two different researchers use different random-number seeds when they perform the imputations, they will get slightly different estimates. Clearly, you would like this difference to be small, which you can accomplish by using many imputations. The paper and blog address the important question: how many imputations are enough?
The purpose of this article is simply to provide greater visibility for von Hippel's work and to advertise a SAS macro that he wrote that implements his ideas. I am not an expert on multiple imputations, so if you have questions about the macro or the method, you should follow the previous links and read the original work.
A formula for the number of imputations
Traditionally, practitioners used 5 or 10 multiple imputations to perform a missing value analysis. As mentioned in the SAS documentation of the MI procedure, "Von Hippel (2009, p. 278) shows that with a small number of imputations, only the point estimates are reliable. That is, the point estimates will not change much if the missing values are imputed again. For other statistics (such as standard error and p-value) to be reliable," you must use more imputations. This was the main reason that the default value for the NIMPUTE= option in PROC MI was changed from NIMPUTE=5 to NIMPUTE=25 in SAS/STAT 14.1.
In his 2018 paper, von Hippel addressed the issue of "how many imputations are enough" by proposing a quadratic formula and a SAS macro that implements the formula based on data.
There is a clever trick in von Hippel's method. The formula needs a certain statistic (called the "fraction of missing information" or FMI) in order to estimate the number of imputations. However, you can't get an estimate for the FMI until you have performed imputations and a subsequent analysis! The solution to this "Catch-22" is to use the same idea that is used in power and sample size computations. First, you perform a small "pilot study" to estimate the FMI. Then you use that estimate in the formula to obtain the number of imputations that are needed for the full study.
By the way, you can use this clever trick in other resampling methods. In my book Simulating Data with SAS (Wicklin, 2013, p. 317), I discuss how to run a "small-scale simulation on a coarse grid" first, and then "refine the grid of parameter values" only where necessary.
A SAS macro for the number of imputations
Paul von Hippel provides a link to a SAS macro that he wrote that implements his two-stage method. In the file "two_stage example.sas," von Hippel shows how to automate his two-stage method for choosing an appropriate number of imputations. Again, I am not the expert, so please direct your questions to von Hippel of to an online SAS Support Community.
In summary, the purpose of this blog post is to make you aware von Hippel's blog post, paper, and macro. It sounds like a nice feature for analysts who use PROC MI in SAS for missing value imputations. Check it out!
1 Comment
Thanks! Glad you liked it. Would it be possible to bake this into a future version of PROC MIANALYZE?