Jittering. To a statistician, it is more than what happens when you drink too much coffee. Jittering is the act of adding random noise to data in order to prevent overplotting in statistical graphs. Overplotting can occur when a continuous measurement is rounded to some convenient unit. This has the
Tag: Statistical Graphics
In my statistical analysis of coupons article, I presented a scatter plot that includes the identity line, y=x. This post describes how to write a general program that uses the SGPLOT procedure in SAS 9.2. By a "general program," I mean that the program produces the result based on the
A reader commented to me that he wants to use the HISTOGRAM statement of the SGPLOT procedure to overlay two histograms on a single plot. He could do it, but unfortunately SAS was choosing a large bin width for one of the variables and a small bin width for the
Many people know that the SGPLOT procedure in SAS 9.2 can create a large number of interesting graphs. Some people also know how to create a panel of graphs (all of the same type) by using the SGPANEL procedure. But did you know that you can also create a panel
Do you have many points in your scatter plots that overlap each other? If so, your graph exhibits overplotting. Overplotting occurs when many points have similar coordinates. For example, the following scatter plot (which is produced by using the ODS statistical graphics procedure, SGPLOT) displays 12,000 points, many of which
The Flowing Data blog posted some data about how much TV actors get paid per episode. About a dozen folks have created various visualizations of the data (see the comments in the Flowing Data blog), several of them very glitzy and fancy. One variable in the data is a categorical
A colleague posted some data on his internal SAS blog about key trends in the US Mobile phone industry, as reported by comScore. He graciously shared the data so that I could create a graph that visualizes the trends. The plot visualizes trends in the data: the Android phone is
The Junk Chart blog discusses problems with a chart which (poorly) presents statistics on the prevalence of shark attacks by different species. Here is the same data presented by overlaying two bar charts by using the SGPLOT procedure. I think this approach works well because the number of deaths is
My last post was a criticism of a statistical graph that appeared in Bloomberg Businessweek. Criticism is easy. Analysis is harder. In this post I re-analyze the data to present two graphics that I think should have replaced the one graphic in Businessweek. You can download the SAS program that
Recently I read a blog that advertised a data visualization competition. Under the heading "What Are We Looking For?" is a link to a 2007 Bloomberg Businessweek graph that visualizes how participation in online social media activities vary across age groups. The graph is reproduced below at a smaller scale:
The Junk Chart blog discusses a potential problem that can arise in grouped bar charts when the two groups have vastly different ranges. One possible solution (which is discussed at the Junk Chart sister blog, Numbers Rule Your World) is to present the data back-back in what is sometimes called