Proportionally sized bubble plots

5

Bubble plots are often used to display social and economic data as Gapminder effectively does. With the addition of the BUBBLEPLOT statement to SAS 9.3, it is now possible to create bubble plots in SAS with a few lines of code:

proc template;
  define statgraph bplot;
    begingraph;
	  entrytitle 'Bubble Plot of Literacy in 2008';
	  layout overlay;
	    bubbleplot x=gdp y=life_expectancy size=literacy / dataskin=matte;
		scatterplot x=gdp y=life_expectancy /
                            datalabel=label markerattrs=(size=0)
                            datalabelattrs=(size=8) datalabelposition=auto;
	  endlayout;
	  entryfootnote halign=left 'Data source: Gapminder';
	endgraph;
  end;
run;

However, in some cases, as in the example above, the representation can be misleading. For instance, the area of the Bangladesh bubble may lead users to think that Bangladesh has a literacy rate half that of India and India has a literacy rate half that of USA. However that is not the case as the data labels show. In situations like this, it is important to scale the bubble sizes proportionately. This can be achieved by setting the BUBBLERADIUSMIN=0 and adding a "fake" observation in the data with the variable mapped to size (literacy in this case) to 0. This results in the graph below where the sizes represent the relative literacy values more accurately.

Complete SAS program

In situations where the data may have both negative and positive values it is desirable to represent both the magnitude and nature of the value. This can be done by using a pair of  eval functions to transform the data column to two columns; one that contains the absolute value and the other the sign of the value. The absolute value can then be mapped to the bubble size while the sign can be used to color the bubbles.

proc template;
  define statgraph bplot1;
    begingraph / designheight=480 designwidth=640;
	  entrytitle 'Bubble Plot of Human Development Indicators and Population Growth Rates in 2008';
	  layout overlay;
	    bubbleplot x=gdp y=life_expectancy size=eval(abs(pop_growth_rate)) / 
		       bubbleradiusmin=0
                       dataskin=matte name="bubble" 
                       datalabelattrs=(size=8) datalabel=country
                       datalabelposition=bottom
                group=eval(ifc(pop_growth_rate<0,'Negative','Positive','Missing'));
	    discretelegend "bubble" / title="Population Growth Rate" pad=5;
	  endlayout;
	  entryfootnote halign=left 'Data source: Gapminder';
	endgraph;
  end;
run;

This results in the graph below, from which one can more easily conclude that Japan, Germany, Russia, Ukraine, Romania and Hungary had negative population growth rates in 2008.

Complete SAS program

 

Share

About Author

Pratik Phadke

Software Developer

Pratik Phadke is a Senior Developer in the Data Visualization group at SAS Institute. He has worked on interactive visualization components used in various SAS products, including Enterprise Miner and Forecast Studio as well as ODS Graphics. He received a master's degree in Computer Science from the University of Maryland, Baltimore County.

Related Posts

5 Comments

  1. Very interesting! Nice code.

    I think your correction ought to be the default, or, at least, an option.

    Something else interesting - if you are doing a bubble plot of a proportion where most of the proportions are very high (as above) then you will get a very different impression if you bubble plot the other side of the proportion (e.g. above, illiteracy rates). Then India (literacy = 66% so illiteracy = 34%) is 34 times the size of the US (literacy = 99% so illiteracy = 1%).

    • Pratik Phadke
      Pratik Phadke on

      Thanks for the feedback!

      Yes, I agree. In fact, we are discussing adding support for the proportional behavior as an option in a future release.

  2. Peter Lancashire on

    If I recall correctly, the old GPLOT statement automatically coloured positive and negative bubbles differently. Are there plans to add this convenience feature to GTL-based graphics?

    • Sanjay Matange
      Sanjay Matange on

      It is certainly worth considering, especially for SGPLOT use case. For GTL, usage of eval syntax provides more flexibility. For example, color change and magnitude computation for temperature with 32 f as the inflection point is straightforward with this feature.

Leave A Reply

Back to Top