In a previous article, I discussed random jittering as a technique to reduce overplotting in scatter plots. The example used data that are rounded to the nearest unit, although the idea applies equally well to ordinal data in general.

The act of jittering (adding random noise to data) is a statistical irony: statisticians spend most of their day trying "remove" noise from data, but jittering puts noise back in!

Personally, I rarely jitter data. I prefer to visualize the data as they are, but I acknowledge that there are situations in which jittering gives a better "feel" for the data. To help you decide on whether or not to jitter, here are some pros and cons to jittering.

### Arguments in Favor of Jittering

• Jittering reduces overplotting in ordinal data or data that are rounded.
• Jittering helps you to better visualize the density of the data and the relationship between variables.
• Jittering can help you to find clusters in the data. (Use a small scale parameter for this case.)

### Arguments Against Jittering

• Jittering adds random components to variables, which means that there is not a unique way to jitter.
• The size of the random component is not easy to automate, but requires domain-specific knowledge. For example, are data recorded to the nearest unit or the nearest half-unit?
• The distribution of the random component is not always clear. In the iris data, I jittered by using random variables from a uniform distribution. But suppose a variable records the Richter scale intensity of earthquakes (rounded to the nearest 0.1). Should you use a uniform distribution to jitter these data? Probably not, because the Richter scale is a logarithmic scale, and because earthquakes with lower intensities occur more frequently than earthquakes with higher intensities.
• If the X and Y variables are related (for example, highly correlated), jittering each variable independently might result in a graph in which the visual impact of the relationship is less apparent. Think about the extreme case where X and Y are exactly linearly related: adding independent noise to each variable results in a graph in which the linear relationship is not as strong.

Can you think of any arguments that I did not mention? Do you think the arguments in favor of jittering outweigh the arguments against it? Are there other visualization techniques that you prefer instead of jittering? Weigh in on this issue by posting a comment.

Share

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

1. Here's an idea: instead of jittering, plot the data with letters instead of dots. And 'A' could represent one data point. 'B' would represent two data points, etc. That would be an excellent way to convey information in a plot. : )

In all seriousness, I'm grateful for the improvements in SAS plots over the years!

2. I am much more pro-jittering than you are, I think. One interesting alternative was presented recently at NESUG and SESUG by Perry Watts

3. I'm not expert in statistics, but when you want to use estimator that are defined with the rank of the variables, jittering seems to be necessary, no?

• I don't know what you mean by "estimators that are defined with the rank of the variables." Maybe you are asking whether it is possible to solve a least-squares regression if the explanatory variables are linearly dependent (so X`X is not full rank)? In that situation, you do not need to jitter. You can use a "generalized inverse" to obtain a solution. The solution will not be unique.

4. Could one use jittering to overcome non-positivity in propensity score estimation, i.e. complete separation in logistic regression?

I stumbled on "jittering" when struggling with logistic treatment model results (to estimate propensity scores) that ended quickly in complete separation with very few variables. They yielded propensity scores that almost failed to discriminate between treatment group and control group. The included variables (only some of them categorical), predicted "too well" the grouping (although the study units had been group-randomized to intervention versus none, before the baseline data collection that preceded intervention start).

I found that jittering has been proposed for discontinuity correction, smoothing and better confidence intervals. My idea would be to add random bits to candidate independent variables likely to predict treatment assignment, so as to weaken their relationship with treatment assignment and thus avoid separation; perhaps multiply, like when using multiple imputation for missing data. I searched but found little or nothing in the literature for this kind of use of jittering.

BTW, I found very relevant your argument for context-sensitive jittering (limits, shapes etc.).

Thanks in advance, even for a very spontaneous / ad hoc guess on this question.

• No, I do not think you should try to use jittering to address complete separation in logistic regression. Read Derr (2019) to get a better understanding of the geometric interpretation of separation.

Jittering changes the data, so it is not appropriate for overcoming a degeneracy in a statistical analysis. (One exception: bootstrap resampling.) The degeneracy is telling you something about the data. It might be saying that the model does not fit the data or that the data are not a representative sample from the population.