Bubble plots are often used to display social and economic data as Gapminder effectively does. With the addition of the BUBBLEPLOT statement to SAS 9.3, it is now possible to create bubble plots in SAS with a few lines of code:
proc template; define statgraph bplot; begingraph; entrytitle 'Bubble Plot of Literacy in 2008'; layout overlay; bubbleplot x=gdp y=life_expectancy size=literacy / dataskin=matte; scatterplot x=gdp y=life_expectancy / datalabel=label markerattrs=(size=0) datalabelattrs=(size=8) datalabelposition=auto; endlayout; entryfootnote halign=left 'Data source: Gapminder'; endgraph; end; run;
However, in some cases, as in the example above, the representation can be misleading. For instance, the area of the Bangladesh bubble may lead users to think that Bangladesh has a literacy rate half that of India and India has a literacy rate half that of USA. However that is not the case as the data labels show. In situations like this, it is important to scale the bubble sizes proportionately. This can be achieved by setting the BUBBLERADIUSMIN=0 and adding a "fake" observation in the data with the variable mapped to size (literacy in this case) to 0. This results in the graph below where the sizes represent the relative literacy values more accurately.
In situations where the data may have both negative and positive values it is desirable to represent both the magnitude and nature of the value. This can be done by using a pair of eval functions to transform the data column to two columns; one that contains the absolute value and the other the sign of the value. The absolute value can then be mapped to the bubble size while the sign can be used to color the bubbles.
proc template; define statgraph bplot1; begingraph / designheight=480 designwidth=640; entrytitle 'Bubble Plot of Human Development Indicators and Population Growth Rates in 2008'; layout overlay; bubbleplot x=gdp y=life_expectancy size=eval(abs(pop_growth_rate)) / bubbleradiusmin=0 dataskin=matte name="bubble" datalabelattrs=(size=8) datalabel=country datalabelposition=bottom group=eval(ifc(pop_growth_rate<0,'Negative','Positive','Missing')); discretelegend "bubble" / title="Population Growth Rate" pad=5; endlayout; entryfootnote halign=left 'Data source: Gapminder'; endgraph; end; run;
This results in the graph below, from which one can more easily conclude that Japan, Germany, Russia, Ukraine, Romania and Hungary had negative population growth rates in 2008.