Beyond the Credential: What’s up with the test before the test?

Blank Chalkboard

If you have taken a SAS Deployment exam in the past 9-12 months, you may have wondered why there are pre-test test questions. While it might seem like I am providing warmer-upper questions out of the goodness of my heart, these pre-test tests are actually called candidate surveys. Their purpose? To draw a correlation between a candidate’s perceived level of their own preparedness and how well the candidate actually does on the exam.

Pre-test surveys are a fairly common practice and of course here at SAS, we jump at the chance to draw statistical correlations. So what does the data tell us? Let’s look at two examples*.

Survey Question #1: How much experience do you have working with the SAS software covered on this exam?

The Experience Level chart indicates the more experience a candidate has, the greater the likelihood of passing this exam.



Survey Question #2: On a scale from 1 -5**, how would you rate your current level of expertise with this software?

(**Scale details are provided to candidates.) Read More »

Post a Comment

What you see is what you get ... maybe!

When using visual analytics, it's important to realize that WYSIWYG (what you see is what you get) is not always true. By recognizing a few optical illusions and tricks, you might become a better data analyst ... and have some fun along the way.

Let's start with the Koffka Rings Illusion. In the image below, both the left and right side of the 'ring' is exactly the same shade of gray in all three pictures, but your brain probably perceives it as different shades because of the shades around it. This is one reason it is very important to pick colors and shades that are easily discernible for your graphs. You can click the image below to see the interactive version, and then each of the colored areas will have mouse-over text showing the exact hex code of the color used, so you can verify that the left and right half of the rings are indeed the same shade of gray (cxA095A2).


This graph (as are all the examples in this blog post) was created with SAS software. Here is a link to the SAS code used to create it.

Visual analytics can often help you see patterns that might otherwise be hidden in the data. If you're the one creating the visualization, you want to help the user see the patterns as easily as possible. This fun example goes to the opposite extreme - how many of you can see the hidden pattern in this image? If I truly wanted to convey this message, I should probably just use text instead of graphics, eh?!? (code)


Sometimes when analyzing data, most of the values are the same, and you want to make it easy to find the outliers. You can help users identify the outliers by graphing them, or showing them in a contrasting color, for example. Wouldn't it be much easier to find the '6' in the following grid of numbers, if it was in a different color from the 8's? (code) Read More »

Post a Comment

A closer look at the teacher salary graph

Teacher salaries have been a hotly debated issue in our state for the past few years. In this blog post, I examine a graph that recently appeared in our local news, point out some deficiencies, and create an alternate graph.

But before we get started, here's a picture of my friend Jennifer teaching her students how to make biodiesel.  If we had had teachers like Jenni when I was in school, I would have probably paid more attention!  ;-)


So, here's the graph I saw in our local NC news recently. The graph caught my attention for two reasons ... it contained data about a topic I was interested in (teacher salaries), and it was a simple and 'pretty' graph.


But after examining the graph for a while, I noticed a few problems. A good graph often makes you ask questions ... but this particular graph didn't provide a way to answer those questions. Here are a few questions this graph brought to my mind:

  • Were the salary numbers in current dollar, or constant dollar?
  • Was the average US salary weighted by the number of teachers in each state?
  • What were the salaries before 1999?
  • Were the salary numbers for calendar year, or for school year?
  • Did they choose a y-axis scale that spread (or squished) the values?
  • Did 'Source' mean the data source, or also the graph source?
  • Could I see the actual data?

Since my blog is about creating graphs, I'm sure you guessed my next move ... creating my own version of the graph. I searched for quite a while, but was unable to find the exact yearly data that was used in WRAL's graph. I was, however, able to find some similar data on the National Center for Education Statistics website - it was more geared toward 10-year increments, and went back farther in time. I downloaded their Excel spreadsheet, imported it into SAS, transposed the data to convert the yearly columns into values (so they could more easily be plotted), and created the following graph using SAS/Graph's Proc Gplot:


Here are some features of my graph: Read More »

Post a Comment

Are you a SAS Jedi? Why not share your knowledge?

image of book coverHave you been kicking around ideas for a great SAS book, but don’t know how to get started? Don’t wait another day – connect with the Publish with SAS community and let those amazing folks at SAS Press help you get your project rolling. You won’t regret it! I’m SO excited to have published my first book, “Mastering the SAS® DS2 Procedure: Advanced Data Wrangling Techniques” – the support and encouragement provided by my SAS Press editor, Brenna Leath, and the rest of the SAS Press team was phenomenal.  When I had trouble identifying data I could use for the programs in my book, they helped me negotiate a solution. When I suffered the inevitable writer’s block, Brenna counselled, cajoled and inspired me to establish a disciplined schedule that got me moving again.  The SAS press team provided several cool cover art selections, and even prepped my manuscript for publishing in both hard copy and e-book format. My Mom and Dad are not geeky, but they sure were beaming when I showed them the finished product!

Lessons learned as a first-time author: Read More »

Post a Comment

How does your favorite US national park stack up?

The US has almost 400 national parks, and has been keeping attendance data for about 100 years. With a holiday weekend coming up, I thought this would be a good time to crunch those numbers, and see how each park compares to all the others...

I first saw the tip of this iceberg in an article on It was an interesting article, ranking the US national parks by their number of recreational visitors per year. Their graph caught my attention ... but upon closer examination, I noticed that it only showed the top ~50 of the ~400 parks, and only 10 of them were labeled in the graph. This seemed a bit limiting to me.


So, of course I had to download the data for all ~400 parks, and tried to find a way to show it all in a graph ... and also be able to identify how each park ranked over the years. I almost bit off more than I could chew. "Almost" being the operative word! :)

What I finally ended up doing was creating a separate graph for each national park - showing all the parks in gray, and the park-of-interest in red. For example, here's the graph of the Wright Brothers Memorial in my state, North Carolina: Read More »

Post a Comment

Picking a surreal vacation destination in the US

Do you need help picking a summer vacation destination - one that's not just great, but surreal? If so, this blog's for you!

It's that time of the year again - I've got gobs of vacation time saved up, but I was drawing a blank on where to go... So I turned to my trusty pal Google, did a few searches, and found an interesting article about the 18 most surreal places in the US. A veritable bucket list of summer destinations that could satisfy my vacation appetite for years to come!

The article listed each location, gave a short description, and included cool photos. This was great for getting started, but it just wasn't quite the interface to the data that I wanted:


Being one of those geographically challenged Americans, I didn't really have a concept of exactly where these places were located, just based on their names. I had a general idea (say, within 500 miles), but I'm more of a "GPS Person" when it comes to locating a place I haven't been to before.

So I created a SAS dataset with the lat/long location of each of these destinations, and plotted the locations as markers on a map. I set up html tags so you can hover your mouse over each marker to see the name, and click on them for more info.

I don't usually let you see the intermediate (bad, or less good) versions of my maps, but I'm including this one, so you can see the difference in it and the final map. There were a couple of destinations in Alaska, so I included Alaska and Hawaii in the bottom corner of the map. This is the traditional place to put these two states so they fit conveniently onto the page, but it just didn't feel right clicking on something that was visually south of  California (on the page layout, at least), to see images of the Northern Lights. Read More »

Post a Comment

You don't know squat about the Las Vegas housing market

How bad was the recent recession on the Las Vegas housing market, and what lingering side effects are still being felt? If you don't know squat about real estate housing markets, then this blog post is for you! It takes a simple graphical look at some data that helps explain the basics ...

I recently read an interesting article in our local news about a real estate scam. Apparently someone had surreptitiously gotten access to a vacant house, and listed it for sale on Craigslist. The new 'owner' paid a $3,000 deposit, and was unpleasantly surprised when the police showed up and told him he was trespassing and had to leave. This got me wondering about all those foreclosed houses that are still vacant in the aftermath of the housing bubble and recent recession. I did a few Web searches on the topic, and found that this is an especially big issue for Las Vegas right now.

I wondered how much of a drop the Las Vegas housing market had experienced (compared to other big cities), so I located the historical home price index data and downloaded the spreadsheet. It was good to have my hands on the data, but I found it a bit difficult to get a good mental picture of it, just by looking at hundreds of numbers...


Read More »

Post a Comment

What we can learn from popular science communicators


I recently had a chance to hear Dr. Karl Kruszelnicki on triple j radio in Australia.  Known simply as Dr. Karl, he has a weekly national show answering science questions on an alternative rock radio station.  Yes, science on rock radio. Yes, national. Yes, Thursday morning – when people are listening.  Frankly, he is good -- really good, really interesting with a continuous number of callers wanting answers to science questions.

This got me thinking, what can we learn from the top popular science communicators of our time?  So I spent some time listening and reading some of their past work to determine what I could take away.  Make no mistake about it.  Learning to communicate technical information is a work in progress for all of us.  Learning from these masters can help.  Starting with Dr. Karl.

Dr. Karl

Dr. Karl covers a remarkable range of facts and background materials.  One of his common methods of relaying information is a form of storytelling.  Specifically, the stories are importantly related back to the listener’s personal experiences.  Psychology researchers Schank and Abelson have a term for this:  they call it “mapping the speaker’s stories onto the listener’s stories.”  Consider the following examples from a recent show.

A listener was wondering why ice evaporates if left for a long period in a freezer.  Dr. Karl responds with, “Think about water on a little puddle in the road. It does not get above 100 degrees C but it evaporates…”, the reason is explained that some of the molecules in the puddle do get to 100 degrees.  He goes on to explain that something similar happens in the freezer, but at a much slower rate.

Of interest is how he explained this.  Clearly, it was easier to “map” this story onto the image of a puddle on the road, as opposed to what the conditions are inside a freezer.

Another listener had a question about GMO crops.  To give a background he starts with an image for the listener.  “Have you ever been walking along a long a field of long grass, and you see those little tiny grass seeds that are about the size or smaller than the size of a match?  That’s corn.”  He goes on to discuss how grass seeds were bred into corn over thousands of years.  Again, mapping to a familiar image of walking in a field.  Then after that background, he discusses the modern approach to producing GMO crops.

The Dr. Karl weekly shows are available as podcasts.  There is a lot to learn from him, both science and the art of presenting information.  Additionally, during winter in the Northern hemisphere you can almost hear the sunshine through the broadcast from downunder. Read More »

Post a Comment

Is that a CSV file ... or have you been drinking?

In this blog post I explore some of the open data police incident reports for Raleigh and Cary, while showing you the easy way to handle various types of CSV files.

In recent years, many cities have set up open data websites, to share various kinds of data about their city. I decided to download the Raleigh and Cary police incident reports, and set up some examples that might be a good starting point for others wishing to analyze the open data for their city.

I found the Raleigh, NC open data page for their police incidents, clicked the 'Export' button and downloaded the CSV file. Their data is stored in a very simple/traditional comma-separated-value format, and I was therefore able to use SAS' Proc Import with the traditional dbms=csv:

PROC IMPORT OUT=raleigh_data

Here's what the imported data looks like:


Similarly, I found the Cary, NC page, and downloaded their data in the csv format. I naively tried the traditional SAS code to import the csv:


But the data came out in (mainly) two big character variables, with multiple fields all globbed together.


Upon further investigation of the cpd-incidents.csv file, I found that the values were separated by semicolons rather than commas. And I thought to myself, "That's not a comma separated value file - what have these people been drinking!?!" But upon further investigation of the definition of a csv file, I found that it has "records divided into fields separated by delimiters (typically a single reserved character such as comma, semicolon, or tab; sometimes the delimiter may include optional spaces)." I guess a semicolon is actually fair game ... so I put on my big-boy pants, and used some slightly different Proc Import code, that allows me to specify the delimiter:


And now the data imports much more cleanly, with each field in a separate variable. Here are a few of the many variables in the data: Read More »

Post a Comment

Jedi SAS Tricks: DIY Tasks in SAS Studio

In my previous post, Jedi SAS Tricks - Make This a Button in Base SAS I demonstrated running a SAS program from a tool bar button in the SAS Windowing environment. The program we execute is the macro from a previous post, Jedi SAS Tricks: The DATA to DATA Step Macro. The wily Chris Hemedinger commented that he had written a blog about a custom task for this function in Enterprise Guide, which is an excellent solution for our Enterprise Guide users. Just so our SAS Studio users don't feel left out, I'm going to show you on how to make a custom SAS Studio task that runs the Data2DataStep macro for you. Because this post is a bit longer and more complex than usual, I'm going to include a link to a ZIP file containing a PDF of the instructions and a copy of the XML code for the Data2DataStep task.

SAS Studio tasks are written in XML, so are very easy to create, copy and modify. In fact, SAS Studio allows you to edit your tasks right in the browser! It's a very useful exercise to copy a few tasks and experiment with the XML to get a feel for what each section does. After poking around, I chose to start my new task by right-clicking in the Task window and choosing New Task from the pop-up menu. This creates a blank template we can use to create our task. Let’s got through the template section by section and create a task that will collect input, then execute the Data2DataStep macro for us.

The first section is the Registration section. First, note the GUID (Globally Unique Identifier). This 128-bit integer uniquely identifies each task. SAS Studio provides you a new GUID every time you copy or create a new task. OK - first we’ll edit the Name and Description tags for our task and add a Category tag to help keep out tasks organized. You can also add links to SAS documentation which might be of interest to the user, if desired. The information we enter here will appear on the INFORMATION tab of our task:

      <description>Generates a DATA step to recreate a few observations of a data set.</description>
      <procedures>DATA step</procedures>
         <link href=""/>SAS Studio User's Guide
         <link href=""/>Base SAS Statements
         <link href=""/>SAS Tutorials

Read More »

Post a Comment