Analyzing wait times at VA health care facilities

Data about the monthly wait times at VA facilities in the US are now available, but it's a bit overwhelming to try to analyze them in tabular form - plotting the data on a map made it a lot easier!...

Here in the US, when our soldiers finish their commitment in the military (retire, or are honorably discharged), they are allowed to utilize the VA health care facilities. But the VA facilities have been under a lot of scrutiny lately - in particular for long wait times.

A recent article in our local news mentioned that the worst VA wait times are in the South. The article mentioned several specific examples, but being a data person, I wanted to see the actual data. I looked around a bit and found the actual data for February 2015. Here's a screen-capture of a portion of the table:


Unfortunately the data are in a table in a pdf file, which makes it quite cumbersome to work with. I ended up copying and pasting it one line at a time into a simple text file I could import into SAS. I got all major rows for each group of facilities (rather than trying to get each individual facility). I then used Proc Geocode to estimate a lat/long for each facility, and annotated them as markers on a map, color-coded based on the number of appointments completed in under 30 days. (Click the map below to see the interactive version, with html hover-text for each marker.)


At this level of aggregation, it does appear that the South might be doing a bit worse than the Northeast, and my state (North Carolina) has some red, orange, and yellow markers (which will hopefully be improving). But rather than trying to compare all the facilities across the nation, I liked that the map allowed me to see where the facilities are located, and hover over them to see their data.

My next step would be to plot all the individual facilities (instead of the aggregate data) - and it would be *great* to find a more convenient version of the data (maybe a spreadsheet or csv file). If anybody knows of a better data source, let me know (hint, hint!)

And to close this blog post, here's a picture of my friend Trena's husband, proudly serving his country - hopefully by the time he's out of the military, we'll have all the facilities running like well-oiled machines, with short wait times and good service!


Post a Comment

A custom map to help track the flu

Has this year's flu been better or worse than you thought it would be?

There are a lot of factors that help determine whether or not you're likely to get the flu. Is there a bad strain going around? Did the flu vaccine target the right strain? Did you get the flu shot? Has the weather been cold & wet? How has your health been poor in general? Have you had to care for family members who had the flu? Etc, etc, etc.

And I guess a lot of flu-factors get rolled into geography - if the flu is "going around" in your area, then you're probably more likely to get it. Which is why I was happy to find the CDC's flu map! It shows all the US states (and a few other areas) color-coded by the prevalence of the flu! Here's a screen-capture of their flu map:


Of course, any time I see a nice map, I naturally want to try to create it in SAS. The CDC map only had 2 challenging aspects that I didn't know the exact code for, right off the top of my head. The first was the cross-hatch patterns - I knew SAS/Graph could do them, but I didn't know the exact syntax. After a quick visit to the pattern statement help page, I determined that the 2 special map patterns could be coded as m4x45 and m4n90. The second challenge was including the territories (such as Guam, US Virgin Islands, and Puerto Rico) in the US map. I decided to subset them out of the world map, re-size & re-scale the x/y coordinates, and then combine them with the US map. Here's a link if you'd like to see the exact SAS code that was used.

The results came out looking very close to the original (see below). And one extra bonus feature of my map is that I added html hover-text for each state - this can be helpful to anyone who is analyzing the data, but in particular allows vision-impaired people to explore the map using Voice-over technology (as the hover over each state, the state name and flu prevalence is read out loud). Click the map snapshot below, to see the interactive version with hover-text.
Read More »

Post a Comment

Landing a SAS Certification

Lauren Guevara

Lauren Guevara

After working as a flight attendant for more than 20 years, Lauren Guevara was ready for a new adventure.

The inspiration for her journey came from an article she read in CNN’s Money magazine that highlighted the earning potential of a SAS Certification. Also having earned a Master of Science in e-commerce years earlier, she naturally gravitated toward the computer industry.

“My mom was the one who encouraged me to read Money magazine,” said Guevara. “The article mentioned career advancements you can make by becoming a data miner and getting certified in SAS.”

After reading the article Guevara started researching SAS online and also purchased the book, Learning SAS by Example: A Programmer’s Guide. That book started traveling the world with her. She devoted her downtime during layovers and breaks to reading. What she learned led her to a unique decision: become a SAS programmer.

Her first step was signing up for online e-learning courses in SAS Programming 1 and Programming 2. “I worked through both e-lessons and tried to learn everything before setting foot in a classroom,” said Guevara.

Eventually she felt ready for the classroom and attended SAS Programming 1 in the Charlotte, NC training center. The classroom training reinforced what she was introduced to in the e-courses and gave her an opportunity to ask more detailed questions. “Coming into this with no experience, classroom and e-learning together was the best way for me to learn it,” said Guevara. “I did a lot of fine tuning in the classroom.”

Guevara wanted to earn the SAS Certified Base Programmer Credential as a way to boost her credibility to potential employers.

“I noticed in the classroom that everyone had computer jobs or worked in the industry,” said Guevara. “Since I didn’t have that same experience, I felt it was necessary to have the credentials to back up my skills. SAS certifications are respected in the industry.”

Guevara purchased the base programming certification package offered by SAS, which included a training course, prep exam and certification exam voucher at a discounted price to help her prepare.

Another study tip she shared was reading the SAS Certification Prep Guide: Base Programming for SAS 9.

Guevara was a bit embarrassed to share that she didn’t pass the exam on her first attempt. However, she realized that it might be inspiring for others to know that it’s possible to fail and still achieve your goals. “The first time I took the exam, I wasn’t ready,” said Guevara, “but I wasn’t giving up. I went back and really started to understand the language better. You really have to know this stuff. It’s hard, but it’s possible.”

With her relentless determination, Guevara passed the base programmer exam and is working to earn the SAS Certified Advanced Programmer Credential by the end of the year. In the meantime, she’s going to attend the annual PharmaSUG event in Orlando to network with other SAS programmers.

Guevara eventually sees herself doing part-time contract work as a programmer, while still flying part time for the airline.

Who knew some simple motherly advice would lead Guevara on this life-changing path? Mom, of course! And she couldn’t be prouder of her daughter. “She sang a song when I finally passed the exam. She’s so happy.”

Learn more and start your SAS Certification journey.

Post a Comment

Applying the KISS principle to maps: An analysis of breastfeeding prevalence

How simple is too simple, when it comes to analyzing data on a map?

The KISS principle can be applied to many things, including graphs and maps. What is the KISS principle, you might ask? Well, it's not the rock band that my friend Patricia (pictured below) has been known to dress like. Instead, it is the principle of "Keep it Simple" (or one of the several variations of the wording). I think KISS is a good goal in general, but should it be applied to the actual geometry of a map? Let's experiment and find out...


I recently saw a map on dadaviz that represents each state in the US as equal-sized colored square. I thought it was an interesting approach (as it helps eliminate area size bias), and therefore I wanted to see if I could create a similar map with SAS. But as I was creating this simplified map, I noticed that many of the states were not in their proper position relative to other states (for example, Virginia was to the west of North Carolina instead of to the north, and South Dakota was to the east of North Dakota instead of to the south). And the states were also difficult to recognize, without their familiar shapes. Well, anyway, here's my SAS version of their map:


So, although their simplified map design is interesting, perhaps it takes KISS a bit too far? I wondered if it might be better to use a slightly less simplified map, such as the ones promoted by Mark Monmonier in his books How to Lie With Maps and Mapping It Out. So I created a custom US map where the states are shaped like Mark's map, and plotted the same data on it. In this map, the states are in the correct relative position, and the somewhat correct size, and therefore much easier to recognize than the squares in the previous map. But I still find it a bit difficult to recognize some of the states, and I wonder what is the benefit of the simplified shapes?


Finally, I plotted the data on a traditional US map. And personally, I prefer this one over the two simplified versions.



So, what's your opinion - which map do you prefer? What are the pros & cons of each map?


Post a Comment

UK General Election 2015: using PROC MAPIMPORT to visualise the election

Election fever has hit the United Kingdom as the days count down to 7th May 2015.  This is likely to be one of the most uncertain elections in recent memory, with nearly 10 parties struggling for votes across England, Scotland, Wales and Northern Ireland.  Results night will be tense, with the different TV channels competing for the most engaging visualisation and graphics. Gone are the days of the simple 'swingometer' which showed the shift between the most traditionally popular Conservative and Labour parties.

In my earlier blog, I looked at ways analytics could be used to forecast results.  But what is the best way to display them?  My esteemed colleague, Robert Allison, is working on how best to do this and will share his results in his forthcoming blog (stay tuned).  However, for a starter for 10, here is how you could produce a map using SAS.

Luckily, the Ordnance Survey provide open source data for electoral boundaries in the UK, in the form of 'shape' (SHP) and response (DBF) files.  You can download it here.   It's a simple matter for SAS to read in this data.

PROC IMPORT out= work.westminster_const_region datafile= " … westminster_const_region.dbf" dbms=DBF replace;

PROC MAPIMPORT datafile=" … westminster_const_region.shp" out=work.westminster_const_region_map contents;
id polygon_id;

Green Party 2010, LondonYou can combine this with open source results data available on sites including Electoral Calculus to plot results to your heart's content.  I created a dataset called 'combined' and plotted the Green Party's results in 2010 in the London region.  In this 'choropleth' map, the greener the area, the more votes the Green Party got in 2010.  To do this, I had to create a 'colour ramp' ranging from very green to white using PROC TEMPLATE.

define style;
style twocolorramp / startcolor=white endcolor=green;
style graphdata1 from graphdata1 / color=white;
style graphdata2 from graphdata2 / color=green;

Finally, I can plot the results using PROC GMAP.

ods html;
PROC GMAP map=westminster_const_region_map data=combined;
id polygon_id;
choro grn;
where region='London';
run; quit;

In the meantime, if you’re keen to find out more about government data and how analytics is shaping the future of our political thinking, check out our research with Civil Service World on Big Data in the public sector.

Post a Comment

Variations on a stickman graph: Analyzing the Twitter minions

One of our customers asked if I could show him how to reproduce a stickman graph that David McCandless (ala, Information is Beautiful) had created. I'm not sure that it's the best kind of graph for the occasion, but of course SAS can be used to create it! ...

David's graph uses 100 stickmen to represent all the Twitter users, and divides them into 5 categories. Each category is represented by a color. In the SAS dataset, I represent each stickman by an X and Y pair (for the position on the grid), and Color_value (1-5, for the 5 color categories), using the following code:

data my_data;
retain x y;
input color_value @@;
if x=. then y=5;
if x=21 then do;
1 1 1 1 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 5
1 1 1 1 2 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5
1 1 1 1 2 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 
1 1 1 1 2 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 
1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 5

I create a user-defined format so that the numeric Color_values (1-5) print in the graph legend as the desired text descriptions, and then plot the points with SAS/Graph Proc Gplot using plot y*x=color_value, and the following symbol statements (the '80'x character of the Webdings font is the stickman figure).

symbol1 font='webdings' value='80'x height=15 color=cxec008c;
symbol2 font='webdings' value='80'x height=15 color=cx8cc63f;
symbol3 font='webdings' value='80'x height=15 color=cx662d91;
symbol4 font='webdings' value='80'x height=15 color=cx00aeef;
symbol5 font='webdings' value='80'x height=15 color=cx939598;

The graph came out like this, which is almost what we want:


I then used axis statements to suppress the axis tick marks, numeric values, and lines (axis1 label=none style=0 value=none major=none minor=none), and have the following plot which is a much 'cleaner' version:

twitter_as_100_people1At this point, I would have "called it a day" and been done. But McCandless' version was a little more "politically correct" and had both stickmen and stickwomen ... which makes creating the graph a bit more complex. The technique I was using only allows you to have 1 color and marker shape per each category. Therefore I changed techniques, and used gplot to create a blank graph, and then used annotate to programmatically draw the stickpeople (annotate always gives you total control).

data anno_markers; set my_data;
length function color $8;
xsys='2'; ysys='2'; hsys='3'; when='a';
function='label'; style='webdings'; position='+'; size=15;
if color_value=1 then color='cxec008c';
if color_value=2 then color='cx8cc63f';
if color_value=3 then color='cx662d91';
if color_value=4 then color='cx00aeef';
if color_value=5 then color='cx939598';
/* make even x-values the female stick-figure, and odd ones the male */
if mod(x,2)=0 then text='80'x;
else text='81'x;

Add to that a few carefully placed (annotated) text labels that explain what the colors mean, and we have a graph very much like McCandless' beautiful version:


What are the pros & cons of these stickman graphs, and what other graph might better represent this data?

Post a Comment

Where are the world's best roller coasters?

Would you rather see a list of the world's 50 best roller coasters, or an interactive map? (... how about both!)

Before we get started on this ride, here's a picture of my friend Jennifer's daughter, getting an early/young start on riding roller coasters (she's in the back, with her hands up)...


Now, on to the 'analytics' part!  My buddy Reggie is a big fan of roller coasters. We grew up in the same hometown, and it was close enough to the Carowinds amusement park that the groups we were in (Boy Scouts, 4-H, school, etc) frequently took day trips there. Back then, their big rides were Thunder Road and White Lightnin' and that's probably where his appetite for coasters formed. And speaking of Carowinds, they just unveiled the Fury 325, the tallest/fastest giga coaster in the world (325-ft tall, 95mph)!

This got me thinking that it might be useful for Reggie to have a list of the best roller coasters, so he could visit as many as possible when he's traveling. I found a list of the world's 50 best roller coasters, but it was cumbersome to try to determine which coasters were geographically close to where Reggie might be traveling. (The 2011/2012 page with their list is gone now - here's a link to the archived version on the wayback machine.)

Of course what some people might see as a data problem, I see as an analytics opportunity!

I got all the data from the list and entered it into a SAS dataset, and then used Proc Geocode to estimate the latitude/longitude of each amusement park and plotted them as markers on a map. The markers have hover-text to show the coaster names, and if the park has multiple roller coasters, then I include all of them in the html hover-text (using somewhat clever code to build up the html tags and text). If you click on the map markers, they launch a Google image search for pictures of that roller coaster. Click the snapshot of the map below to see the interactive version:


Below the map, there's an interactive text list of all 50 roller coasters, with html links on each coaster name to launch a Google image search for pictures of that specific coaster (these links provide a bit more granularity than the 1-link-per-park links in the map). Here is a screen-capture of part of the list -- click the list to see the actual page with the interactive map & list:


Have you been on any of the roller coasters on the Top 50 list? How did you like it?

Post a Comment

Drilling down on fracking graphs

The topic of fracking has been in the news a lot lately - this blog post explores some of the finer points of plotting opinion data related to fracking ...

I recently saw the following graph on It showed some interesting data, and presented the data in a way that I almost approved of. Give it a quick look, and see if you can guess the things that bothered me (before I elaborate on them):

fracking_opinion_origHere is a list of the things I didn't like about their graph:

  • The long all-caps title was a bit overbearing.
  • The source of the data wasn't listed in the graph itself (you have to look in the dadaviz side-bar).
  • The bar segments aren't in a logical order (I think neutral should be in the middle).

As you've seen in the past, I don't complain about a graph without trying my hand at creating a better one. I found the Gallup article and entered the data into a SAS dataset. I noticed that the article also listed the 'Overall' opinion statistics - I think that's very important to help understanding the data, so I added that to the graph. I used Proc Transpose to get the data structured in a way that it could be easily plotted by Proc Gchart, and then created a grouped horizontal bar chart (similar to the original one, but with an extra group to show the overall statistics), with the 'neutral' segment in the middle (the Gallup article called it 'no opinion' therefore that's what I called it). I made the title mixed-case, and added a footnote at the bottom of the graph to let readers know that it was based on Gallup data.


The Gallup article also had fracking opinion data by age group, so I created a graph for that as well:

fracking_opinion_ageLet's not get into a discussion about whether fracking is good or bad. But I invite you to leave a comment on why you think the opinions differ by political party and age group.

Post a Comment

SAS finds evidence for extraterrestrial life?

PlanetJust this morning, the course leader at our newly created SAS Space & Astronomy School told me that they picked up a broadcast signal from outer space. By analysing all the data they have been collecting, they were able to quickly spot a spike in the trend pattern, which helped them pinpoint the exact location of the signal.

Within a few hours, they established a conversation channel and are in the process of exchanging information. Scientists are hoping this may enable them to paint a picture of what could be a living environment and home to extraterrestrial inhabitants.

Picking up mystery signals from outer space is not unusual. Astronomers monitor for pulses from deep space all the time.  But the lack of similar findings by facilities, other than other than the Parkes radio telescope in Australia, has left scientists baffled by what this could mean.

We are waiting with bated breath for more developments but it could be an earth-shattering breakthrough. We only recently set up the SAS Space & Astronomy School to help train and develop future astronauts, in response to various space programmes.

Sounds like science fiction? Quite possibly…it is April Fools’ Day after all!

But returning to reality, the Mars One mission to send humans to colonise the Red Planet by the mid-2020s has raised a few eyebrows. The feasibility of the project’s selection process, timelines and budget are just a few of the recent concerns. Yet, the one-way trip to the red planet hasn’t deterred willing volunteers putting themselves forward for the mission.

Of course, it will be one of the biggest technical, physical and psychological challenges of our time. To have humans living on Mars will be more breathtaking than seeing Armstrong becoming the first man to walk on the moon.

Becoming an astronaut doesn’t happen overnight. It takes many years of education and experience to meet the basic qualifications. Only a fraction of one per cent ever make it into space training programmes. That’s why we need talented scientists who can make the grade, and make the most of the available technology.

Using SAS advanced analytics, we are analysing and extracting insights from data that will be open to astronauts and space scientist from around the world. We have the technology to analyse a huge amount of structured and unstructured data, and we want to find the analytical talent to drive this forward.

SAS is actually involved in the search for intelligent life somewhere in our universe. An amateur astronomer in Chicago named Robert H. Gray, has been using SAS in his search for radio signals from other worlds. You can read about his story and his ‘wow’ discovery here.

For other real life examples of how predictive analytics supports marketing, risk, operations and more to create competitive advantage, download our report.

You can also check out where Gartner ranked SAS for advanced and predictive analytics earlier this year in its 2015 Magic Quadrant for Advanced Analytics Platforms report.

Post a Comment

Paint it black: A song choice or a graph background?

This blog post discusses the use of a black background in a graph. But before we get started, I invite you to have a listen to one of my favorite songs - "Paint it Black" by the Rolling Stones. Perhaps this song subliminally persuaded people to use black backgrounds in their graphs? (just one of my conspiracy theories!)

A twitter post recently took me to a tumblr blog about a map that Seth Kadish had created. The coloring of the counties in the map represented the percent of the workers in that county who commute to another state to work. As expected, counties closer to the state borders generally have more of their workers commute to another state.

Here's a screen capture of his map (below). As you can see, it's very difficult to see the map against the black background:


Hahaha - ok, that's not actually it, but I invite you to click this link to have a look at his map! It's visually captivating (it certainly caught my attention), but once I started really trying to understand the data, I found that it was somewhat lacking in that area. The main problem is that the black background is visually very similar to the color representing the maximum commuting values. And therefore, it is very difficult to tell the two apart. For example, that round dark area in southern Florida might at first look like a county with a very high commuting value ... but it is actually a 'hole' in the map for Lake Okeechobee (showing the black background behind the map). It's very difficult to tell whether the dark area in the map are counties with high values, or holes/lakes/bays/etc in the map.

So, naturally, I downloaded the data (table S0801) from the Census website, imported the CSV file into SAS, and plotted it on my own map ... with a white background. Now the dark/black counties stand out more, and you can easily tell the difference in the map background and the data). And if you click the thumbnail below, you will even see that my SAS version has html hover-text for each of the counties.



Technical Discussion:

Why do people use black backgrounds behind graphs? What are some of the arguments for and against? I'm no expert in this area, and have not done any controlled experiments or surveys, but here are my personal thoughts...

For Black Backgrounds:

One argument for black backgrounds is that it looks better than white when projected on a screen during a presentation. In my experience, this could be true, especially if bright/vibrant colors are used along with the black background. But this probably depends on what kind of presentation you're giving, and who your audience is.

Along those same lines, sometimes you might use a black background in a graph in order to make it stand out from all the typical (white background) graphs. For example, when creating a 'headline' graph that's meant to catch people's attention, rather than being used for nitty-gritty analytics. Even then, I would guesstimate that less than 1 out of 20 of my fancy presentation graphs have a black background.

Another argument is that with some display devices (maybe some CRTs and maybe some LED screens), it consumes less energy to display 'black' pixels. Therefore using a black background could save energy, which could be an advantage in mobile devices. I did some web searches on this topic, and couldn't find any definitive studies in this area. At the very least, from what I gather, this is not the general case (especially with displays that use a 'backlight'), and should probably not be a factor for your graphics design.


Against Black Backgrounds:

Many people find graphs with black backgrounds annoying, and difficult to read. For example, how did you like the black background in the text items above? :-)

It has also been my personal experience, that when using dark background and light text in a graph, I have to make the text bolder/thicker in order to be readable. And if you resize/shrink the graph, sometimes the text becomes un-readable. This problem seldom comes up when using dark text on a white background.

When you use a black background, this makes it very 'expensive' to print. Whereas white backgrounds would use no ink to print on white paper, black backgrounds use a lot of ink to print.

My personal opinion is that black backgrounds are occasionally good when you want a certain artistic effect, but should not be used for graphs in general.  Or in layman's terms ... black backgrounds are for velvet paintings.

Post a Comment