Do you trust statistics?

One of my favorite quotes is: "You can't believe everything you read on the Internet" - Abe Lincoln, 1868.

And that is especially true when it comes to graphs and statistics. Hardly a day goes by without me seeing a bad graph that misrepresents the data (either intentionally or unintentionally) . Here is a recent bad example I was surprised to find on Statpedia ...


At first glance the graph seemed like a reasonable way to plot the data, but upon closer examination I found a terrible problem that compromises the data integrity! ... They have plotted the survey results all evenly-spaced (probably as character values), even though the surveys were not performed at evenly-spaced date intervals! This seriously misrepresents the data, especially towards the left side of the graph, when the surveys were performed much less frequently. Also, after examining the source data, I found that they had left out the value for the first/oldest survey. Read More »

Post a Comment

Everything you need to know about the new Analytics Experience event

SAS is kicking off a new event Sept. 12-14 in Las Vegas called the Analytics Experience. It’s actually not completely new. In the past, we’ve held our annual Analytics and Premier Business Leadership Series conferences in the fall. But as the industry landscape continues to change – SAS continues to prove it’s leading the way.

What does that mean?  Today there is no separation between the analytical guru and the company thought leader. Analytics Experience will allow attendees to wear both hats at one event, offering the perfect mix of thought leadership, analytics, strategies and connections.


Kelly Check

I had the chance to interview SAS’ event organizer and marketer, Kelly Check, about the new changes and what attendees can expect at Analytics Experience 2016.

  1. What is new with the Analytics Experience? (Besides the name change)

Aside from the change in timing – the conference is now in September instead of October – the new twist is combining practitioners and the executives into one event. There’s no longer a boundary between business leaders and analytics pros! And with the new combination, there is a new pricing structure; conference fees are waived for executives (director+) as well as students; discounts are available to practitioners, faculty, partners and others.

  1. Why combine the Analytics and Premier Business Leadership Series conferences?

Former attendees shared feedback that they wanted more choices; they didn’t want to be limited by their title or organization. They wanted to network with analytics gurus of different backgrounds. So while executives can listen to detailed case studies, students and practitioners can hear big ideas from thought leaders. Every attendee can still customize their agenda for the most beneficial insights.

  1. Who should attend?

Anyone who wants to embrace the latest technology and find value in their data – no matter his/her title. Read More »

Post a Comment

How to use SAS software GMap procedure to capture Pokémon!

With the Pokémon Go craze sweeping the world, techies and programmers are looking to apply their skills to gain an advantage over the average user. In this blog post, I show how to use some of SAS' geospatial analytics capabilities to capture a Pikachu.

Let's say you know of a building that has an active Pokéstop with verified Pikachu sightings. First, you'll want to obtain (or create) a floor plan, and save it in an image file (png, jpg, gif, etc).


Next, you'll want to come up with a convenient coordinate system, and create a grid of unit cells by looping through the grid values in a SAS data step loop, and output 4 coordinates and an id variable for each cell. You can then use Proc GMap to draw the grid, and annotate the image of the floor plan behind the grid (here's my code).


Now you'll need to start collecting geospatial data that you can plot as colored areas on the grid. Here, I have determined the x/y grid locations of lures attached to this Pokéstop, and plotted them as dark brown areas on the grid. Can you detect any clustering or trends here? (Note that my friend Kenny, who was a professional/paid gamer, helped me with the finer details of this analysis.) Read More »

Post a Comment

When I grow up, I want to be a data scientist

GettyImages-73773904When I was growing up the term “data science” didn’t even exist, let alone dedicated “data scientist” roles. My friends and colleagues might argue that is because I am yet to grow up (!), but do not let this ruin my lead in to the fact that data science as a field and data scientist as a job title is a very recent smash hit, even though many were doing most of what we call data science decades ago.

Making an assumption that you have read widely on what data science is, beyond the ability to type data scientist into LinkedIn, you know that there are massive opportunities in the field and that these skills are in high demand. I have seen myself firsthand how hard it is to find these skills in the market. Becoming a data scientist can increase your paycheck and set you up for a challenging and rewarding career.

On top of this, a recent Money magazine and study showed that SAS skills are the biggest pay differentiator in the market. SAS and data science together could set you well on your path, so if you are sitting here wondering, “How do I work through the multitude of learning options available to me?” I am here to try to help. Below I have covered three approaches to set you on your path to becoming a data scientist, ideally a SAS skilled one as well!

Read More »

Post a Comment

Jedi SAS Tricks: Explicit SQL Pass-through in DS2

One of the things I’ve come to love most about DS2 is the tight integration with SQL which makes so many data prep chores so much less onerous. An example is DATA program BY group processing. With a traditional DATA step, you must first sort or index the source data before using a BY statement. For example, this program produces an error, because the data has not been properly sorted:

data new;
   keep make model cylinders;
   by descending cylinders make model;
   if _n_=10 then stop;

But, because DS2 always retrieves your data with an implicit SQL query, there is no need to pre-sort the data. This program runs just fine:

proc ds2;
title 'DS2 BY processing with base SAS datasets' ;
title2'BY descending cylinders make model';
   keep make model cylinders;
   method run();
   by descending cylinders make model;
   if _n_=10 then stop;

Results of DS2 BY processing without pre-sorting the data
And it makes no difference if the source data resides in SAS or in a Relational Database Management System (RDBMS) like Oracle. Read More »

Post a Comment

Putting the US in the EU ... bucking the Brexit trend!

What would it be like if the US was in the EU? I don't know how that would work out politically, but this map shows how it might look geographically (if the US was literally picked up and moved to Europe!)

My buddy Rick Langston is a bit of a map guy, and occasionally sends me cool examples. He recently sent me one he had seen on twitter (attributed to Randy Olson), that shows the continental United States overlaid on Europe at the geographically correct latitude. I was a bit surprised to see that the entire UK, and several other countries, were farther north than the US!


I'm not sure of any practical purpose for such a map, but I immediately knew I had to create one with SAS software! :)

First, I used Proc Gproject to clip the rectangular region of Europe out of the world map. Then I subset the continental US out of the world map, added an offset of 114 degrees to the longitude, and combined it with the Europe map. I plotted the combined map, and used a transparent color for the US.

I always try to make a few improvements when I imitate a graph, and here are the things I (hopefully) improved in this one: Read More »

Post a Comment

Pokémon: Gotta graph 'em all!

So, how many different Pokémon have you caught - and more importantly, how many different kinds are still out there that you haven't caught yet? I've created some graphs that might help you figure it out!

I think my previous blog post might have irritated some of the hardcore Pokémon players out there (based on their comments), by claiming that the only important Pokémon data is Nintendo's stock price. I enjoyed poking my Pokémon-obsessed friends ... but I'll try to make up for it. This time I'm plotting actual Pokémon data, which players might actually find interesting and useful!

To get you in the mood for a blog with real Pokémon data, here's a picture of my friend Jenni with a Spearow on her shoulder:


When the most recent Pokémon game went viral, I started searching the Web to see what it was all about. One of the articles I found had some graphs in it, which of course caught my attention. Read More »

Post a Comment

Graphical analysis of all the important Pokémon data!

Are you caught up in the recent Pokémon Go craze? Or maybe just trying to figure out what all the fuss is about? In this blog post, I try to analyze all the important Pokémon-related data in one graph!

When the original Pokémon game first came out around 1995, you needed a pair of Nintendo Game Boys to play it. The most recent release is played on a smartphone (which just about everybody has), and utilizes features of the phone such as GPS location, being connected to the Internet, the phone's camera, and the ability to combine pictures of the Pokémon creatures with real world images (augmented reality).

My friend Kara has a piece of her artwork on public display, and it has become a Pokéstop for a Paras Pokémon. Here's the augmented reality picture:



And now, on to the analytics... Read More »

Post a Comment

What to include in your website

What information should you make easily available from the top page of your website? This Venn diagram might help you decide!

Have you ever gone to a website to try to find some information, and had a (expletive) difficult time trying to find that info? I think there is often a disconnect between people designing websites and the people using them - designers seem to be mainly concerned with having a certain look and using the latest technology to display a slideshow, whereas users just want to be able to find the information quickly and easily.

I found the following graphic that demonstrates this pretty well. It was designed by Randall Munroe, and his site describes itself as "A webcomic of romance, sarcasm, math, and language." I assume that being a webcomic is the reason he uses all upper case letters, but with this amount of text, I think that makes it a bit difficult to read. Also, those not familiar with Venn diagrams might not get what it's saying.


So I decided to create my own version using SAS, and make it a bit more professional, easier to read, and more intuitive. I used annotate to draw the circles, and filled them with transparent blue and yellow, so that the combined area in the middle is green (I think just about everyone will understand that green is the combination of blue and yellow, which will make the Venn diagram concept more obvious). And I used mixed case text, so it is easier to read. Since there is no built-in SAS procedure to create this plot, I hard-coded the x/y positions for each piece of text, and then annotated the text on top of the colored circles. Read More »

Post a Comment

Apparently the cool kids don't smoke any more

I've noticed fewer and fewer people smoking these days, and was wondering who the last holdouts are. Let's run some numbers and find out...

Back in the 1950s, 60s, and 70s it seems like almost everyone smoked. You hardly ever saw the "cool kids" such as James Dean without a cigarette - and entertainers, celebrities, and hosts frequently even smoked on TV (which is now taboo). My buddy Reggie sells antiques, and here are a couple of vintage cigarette lighters he has in his inventory, from the golden age of smoking. Nice lighters were probably common back in the day, but are rare and sought-after collectibles now:


I hardly see anyone smoking these days. The SAS headquarters here in Cary is a smoke-free campus. And North Carolina, which is one of the major tobacco-growing states, recently passed a law which bans smoking in restaurants. So, statistically speaking, who are the few remaining smokers?...

I found an article on with this exact kind of information. It shows the smoking prevalence, broken down by several different demographic categories. Here's their graph of smoking by income range. It's a fairly clean and straightforward graph, but I think it would be a little more intuitive to reverse the order of the income axis (put the higher income at the top, and lower income at the bottom). Another drawback is that you have to see the rest of the page to know what year the data is from, and what the colors stand for.


I downloaded the data from the CDC, and created my own graph, to see if I could do a better job. I sorted the income axis so the higher values are at the top, and I added labels above each bar color (whereas the original graphs only showed the color legend info on their first graph). I also included the data source and year in a footnote, and changed the title text a bit to make the graph more stand-alone. Read More »

Post a Comment