You've probably seen a population pyramid, such as this one I showed in a previous blog post. But let's scrutinize population pyramids a bit deeper, with an eye on special features that can make them even more useful!
I was inspired to give population trees a second look by this example I saw on reddit. Below is a snapshot of one frame, out of their animation over several years. It had a few nice features I hadn't seen before, but it was a little "rough around the edges." I wondered if I could polish it up a bit!
I downloaded the raw data from the statistikdatabasen.scb.se website, as an Excel spreadsheet, and then imported it using the following code. In a highly customized/specific graph like this, that's about as far as the simple/standard coding goes! I had to use Proc Transpose twice, as well as manipulate the data using data steps and Proc SQL, to prepare the data to create this graph.
proc import file="sweden_population.xlsx" out=my_data dbms=xlsx replace;
My Graph (Spoiler Alert!)
Here's my final graph (or at least the frame for year 1924). And in the sections that follow, I'll describe the customizations and special features.
As I approach retirement age, I've become aware that there are more older women than men (which affects dating possibilities, by the way!) Another way of saying this is "there is a surplus of women in my age group."
Most population pyramids have one bar segment to the left, and a corresponding one on the right, for each age. But this chart has (up to) two bar segments on each side. The second/darker bar segment shows the surplus of men (dark blue) or women (dark red) at that age. And I display the colored bar segments as a stacked bar chart.
To get these stacked bar segments, I had to split up my data just right, and then use SGplot hbarparm's group= and groupdisplay=stack options.
hbarparm category=age response=people /
"Born On" Labels
In the original graph, there were 'temporary' labels for certain things that could affect population (such as World War-I and the Spanish Flu). I liked these labels, because they helped put the data into context. But in the original graph animation, those labels go away after a few years. In my version, I make the labels permanent, and my labels move up the graph (following the the bar segment they were "born on") as the animation proceeds through the years.
There are several different ways to add text labels. I chose to use annotate, so that I would have total control.
data anno_year_born; set anno_year_born;
length label $100 anchor x1space y1space $50;
function="text"; textcolor="gray44"; textsize=9; textweight='normal';
if label='1867' then label='Famine';
if label='1914' then label='WW-I';
if label='1918' then label='Spanish flu';
if label='1938' then label='WW-II'; /* fudged the year a little, to get label to fit better */
if mod(year_born,10)=0 or label^=trim(left(year_born)) then output;
The shaded areas in the graph correspond to the "born on" labels. I think these shaded areas provide a good visual cue, and give your eyes something to follow in the animation. Note that I'm shading the same years as the original graph, but I'm not sure of the exact criteria they used to decide exactly which years to shade. I create these shaded areas by overlaying a second bar chart with partially transparent black fill color.
hbarparm category=age response=shadow_people /
The original graph had horizontal reference lines. But I also like to also have vertical reference lines - these help you easily see whether the bars are increasing or decreasing slightly, or whether they have crossed some threshold. Also, rather than making the reference lines solid, I made them dashes (this helps distinguish them from the bar segment outlines).
yaxis display=(noline noticks)
values=(0 to 110 by 10) reverse type=linear
grid gridattrs=(pattern=dot color=gray55)
values=(-100000 to 100000 by 20000)
grid gridattrs=(pattern=dot color=gray55)
The original graph labeled the vertical (age) axis in the middle of the graph (somewhat between the male & female bars), where it is difficult to read the dark text against a somewhat dark background. I placed my age axis along the left side of the graph (in its traditional location). I think it's much cleaner, and easier to read there.
I followed-suit with the original graph, and added a large 'year' label in the top/right of the graph. It would be programmatically easier to put the year in the title ... but I much prefer having it in the graph area like this. I again used annotation for this text.
Here are some of the options I used to create my gif animation. The animduration option controls the time delay between each yearly frame. I created my animation with a similar speed to the original, but with a longer delay/pause at the end, before it re-starts again. I created this pause by repeating the final frame 20 times. The pause on the final frame gives the user more time to see the final graph, and digest the information.
options dev=sasprtc printerpath=gif animation=start animduration=.20 animloop=yes;
There's a 1 MB file size limit here in the blogs, so I was only able to include a summarized version of the animation below (with one frame per every 10 years). But here's a link to see my full animation, with one frame per each individual year (it provides a much smoother visual experience). And also, here is a link to my full SAS code, if you'd like to see all the details!
Does the animation show anything interesting (expected, or unexpected) about Sweden's population? Do you have insight/expertise to share, that might help interpret the animation?
- How did some of the bars actually grow longer in more recent years, rather than shrinking?
- Did the population shrink, or grow, during WW-I and WW-II?
- Did you notice how the age where the female surplus started has shifted over the years?
- What other labels/shadows would be interesting to add to the graph?