Does your county have a Mine-Resistant Ambush Protected vehicle?

There's been a lot of debate lately about whether it's good or bad for the local law enforcement agencies to receive military surplus items. The data has recently become available, but it is just a big long text list of almost a quarter million items. Therefore I thought I would use SAS software to analyze the list, and determine which counties received one of the biggest and most visible items, Mine-Resistant Ambush Protected vehicles (MRAPs).

After analyzing the data, I found that only 5 of the 100 counties in my state (North Carolina) got an MRAP, and the county where the SAS headquarters is located (Wake County) did not receive one. Wake County, by comparison, went the other route - rather than using a large intimidating "shock & awe" military vehicle, Wake County has a Z06 Corvette patrol car :)


But enough about my county ... what about your county! The question at hand - did your county get one of the MRAPs?? Here's my SAS map that will help you find out! Click the snapshot below to see the full-size map, with html hover-text so you can see the county names:


So, did your county get an MRAP? Have you seen it in person yet? Would you have known which counties in your state have them, without looking at this SAS map? Sometimes I think I make this data analysis stuff too easy on you!  LOL

Post a Comment

The growing role of analytics in management education


Owen P. Hall, Jr., P.E., Ph.D.
Pepperdine University

This guest post was written by Owen P. Hall, Jr., Professor of Decision Sciences at Pepperdine University’s Graziadio School of Business and Management. He has more than 35 years of academic and industry experience in mobile learning technologies and business analytics. He also holds the Julian Virtue Professorship and is a Rothschild Applied Research Fellow.

Dr. Hall will also be presenting on The Growing Role of M-Learning in the Analytics Revolution: Application to Management Education at the Analytics 2014 conference in Las Vegas, Oct. 20-21.

Management education has come a long way since Sir Isaac Pitman initiated the first correspondence course in the early 1840s. Today, business schools are under growing pressure to engage in significant reforms due to the impacts of globalization, new learning technologies, and unprecedented economic uncertainty.

The increasing use of Analytics in business and government to improve efficiency and performance suggests similar opportunities for schools of business. A recent study, sponsored by the Association to Advance Collegiate Schools of Business (AACSB), revealed a wide gap between the changing needs of the business community and the programs being offered by the business management community.

The Analytics paradigm can be used to enhance management education and close the gap with the business community in the following ways:

1) Provide a conceptual setting for expanding student managerial decision-making expertise,

2) Assess student performance and identify appropriate additional learning resources via intelligent tutors, and

3) Offer business school administrators the capability to optimize operational effectiveness.

Identifying the “best” approach for teaching students and training professionals in modern decision-making and problem-solving is at the heart of the Analytics movement in academe. The learning goal should be to develop general analytic and quantitative problem-solving skills, which allow the graduate to transition seamlessly into the business universe.

The curriculum should be designed in such a way as to create an environment where the student becomes comfortable in using a wide variety of decision support tools. This is where mobile learning can play an important role. Typically, mobile learning (m-learning) is defined as the acquisition of knowledge through conversations across multiple contexts via interactive technologies. Mobile learning is designed to significantly alter the three pillars of traditional management instruction—fixed time, fixed location, and fixed learning pace—with a more flexible and customized learning environment.

The pairing of social media based mobile learning with the Analytics paradigm provides a vehicle for effectively integrating analytical based decision-making and problem solving into the curriculum at a time and place convenient to the student. This is a particularly important feature for working adults and those with disabilities.

The Analytics paradigm also provides business school administrators with the capability to better manage the institution’s strategic and tactical resources by reforming the traditional decision-making process.

Today, most business schools are facing intense competition and demanding students.  These forces tend to drive up the cost of student acquisition and retention. The emergence of the Internet generation as the new student body, who are web savvy and heavily engaged in social media, requires institutions to develop a more robust, nimble, and real-time response capability. Properly aligned with the school’s mission, the Analytics paradigm offers the promise of strengthening student learning and employment opportunities as well as improving institutional operational efficiencies.

... Custer at Little Big Horn -- I'm not saying that I don't accept some responsibility for what's happened here today. But if our data-mining techniques had been a little more sophisticated, we would have known that this was a bad idea from the get-go ... (WSJ, 11/24/12).

Post a Comment

Analyzing the Atlantic Coast Pipeline

Admit it - you were fascinated by the 3D pipes screensaver! How many CPU cycles did we let our machines burn up, drawing and re-drawing those silly pipes!?!  What was your personal favorite CPU-hungry screensaver? (let us know in a comment!)


Well, speaking of pipes, the proposed Atlantic Coast Pipeline (for natural gas) has been a big item in the news here in North Carolina lately. It would start in West Virginia, go through Virginia, and end at the southern side of North Carolina near Lumberton.

There's been a lot of interest lately from SAS users wanting to perform analytics on utility, road, railroad, and other similar data, and then visualize the data on a map. This pipeline seemed like a great opportunity to create a proof-of-concept example and show some of the things SAS can do with this kind of data!

So I went to Dominion's website and looked at their official maps, and estimated the latitude/longitude coordinates along the proposed route. I then used Proc Gmap to create a map of the 3 states of interest, and annotated the proposed route as a line on the map:


Next, I used Proc Ginside to determine which counties the lat/long points along the proposed route went through, and shaded those counties a different color on the map:



I then took a list of all the cities in these three states with population of 18,000 or more, and used Proc SQL to create a dataset with pairings of every city to every point along the pipeline.  I then used the geodist() function to calculate the distance between each city and each point along the pipeline, and then selected just the pairing with the minimum distance (ie, the minimum distance from the city to the pipeline). I then annotated a marker and label for each city within 30 miles of the pipeline.



And, if you click the map above, you can see the interactive version, with html hover-text for the counties and cities along the proposed route.

Now that you know what's possible in SAS, do you have any data you'd like to plot using this technique? Feel free to leave some comments with your ideas! And here's the sas code if you'd like to experiment with it!


Post a Comment

Jedi SAS Tricks - These aren't the labels...

In the first Star Wars movie, Obi-wan uses Jedi mind tricks to convince the stormtroopers that the droids they see are not the droids they're looking for.  A colleague at SAS passed along a question from a SAS user where the column labels they were seeing were NOT the labels they were looking for:

Does anybody have code to modify SAS column labels so that the variable name and label are simultaneously displayed when viewing the table?  For example, we would like to create a new label that has the value 'VarName: Label for VarName' instead of just 'Label for VarName'.

I started thinking about how I would customize the labels using SAS code. I'd want the process to be as automated as possible, and I'd want to accomplish the task without rewriting all of the data in the table just to change the labels. I settled on using a little PROC SQL with its dictionary table magic, a pinch of SAS macro, and finally the raw power of PROC DATASETS to do the deed.  Let's start by creating a dataset to play with:

data haircuts;
  input Name $ Amt;
  label name='Person' Amt='Price of Last Haircut';
Luke 25
Obi-wan 15
Darth 0

The dataset (with labels displayed) looks like this:


Now, we'll use PROC SQL and the dictionary.columns table to construct the ATTRIB statements we'll deploy in PROC DATASETS:

proc sql noprint;
select cat('attrib ', strip(Name), ' Label="',catx(': ',Name,Label),'";')
  into :code separated by ' '
  from dictionary.columns
  where memname='HAIRCUTS' and LIBNAME='WORK'

The macro variable CODE now holds all of the ATTRIB statements we will need in the subsequent PROC DATASETS step:

attrib Name Label="Name: Person"; 
attrib Amt Label="Amt: Price of Last Haircut";

Finally, we'll submit the PROC DATASETS code to make the changes to our dataset descriptor:

proc datasets lib=work memtype=data nolist;
  modify haircuts;

And violá! Labels just like the customer ordered:


So now, these are the labels we were looking for and my work here is done (for today, at any rate).

Until next time - may the SAS be with you!

Post a Comment

US public schools projected to be majority-minority in 2014

Being that today is a Throwback Thursday, I decided to re-create one of my favorite things from elementary school -- show-and-tell! And of course I'll be showing and telling you about a graph ... using school data! :)


OK - that SAS graph certainly is "Throwback" but it's not the one I'm going to show-and-tell today! LOL

The end of August is the traditional time for all the kids to head back to school in the US, and I saw an interesting graph in a Pew Research article about the racial/ethnic composition of US public schools. This fall, for the first time, the combined minority students will outnumber white students.

Pew's graph was interesting, but there were a few things about it that I didn't like. For example, they only showed bars for ever other year (even numbered years), except for the first two bars (1997 and 1998). Also, the "percent total minority enrollment" numbers were placed on the "black" bar segment, which could be misleading (unless you carefully read the labeling). There were also a lot of labels on the graph - for example they scattered labels around the graph showing what colors represented what racial/ethnic group, rather than using a color legend.

I decided to try to create my own version of the graph using SAS, to see if I could "clean it up" and make things a little more straightforward.

I did some checking and found a table of the data (p. 33) in a paper published by the National Center for Education Statistics (NCES). I copy-n-pasted it into the datalines section of my SAS program, and then used Proc Transpose to rearrange the data so it would be easier to work with. I used Proc Gchart's subgroup option so that each race was represented by a bar segment with a different color (using the exact same colors as the Pew graph). The bar segments would normally be ordered alphabetically, therefore I manually assigned numeric values in the data to produce the desired order, and then created a user-defined-format so that the numbers would print in the legend as the desired text.


I made several changes from the original. Since there was data for every year, I plotted all the data (rather than just the even years). I show the total majority and minority percent values to the extreme left and right, rather than in the middle (highlighting the 2014 values in red, since they are the important ones). I added reference lines at 50% since that is the important/transition value the graph was created to show. And if you click my graph (above) to see the full-size interactive version, you'll notice it has html hover-text for each bar segment, in case you want to see the exact values.

What do you think of the new graph? Any suggestions to make it even better? :)

Post a Comment

Turnout vs popular party in European elections

In the last edition of the European elections, the main question was not only “Who is going to win the elections?” but also “What is going to be the turnout?” As it has taken a month and half to establish the real figure (the previous figure was based on exit polls so was preliminary) we can finally dive deeper into it.

The relationship between voter participation and party fortunes has received much attention in the voting behavior literature. Specifically, a number of studies on the advanced industrial democracies postulate that left-of-center parties benefit from higher turnout.

Turnout is seen as a critical test for European democracy, and it has fallen steadily and consistently since 1979, from 62% in the first election in 1979 to 43% in the 2009 election.The final turnout figure for this year’s European Parliament elections has been revealed as 42.54% – the lowest-ever turnout and lower than the number communicated immediately after the elections. This means that the eighth legislature of the European Parliament will not be more representative for the European population than the previous one.

In statistics we make a difference between the population and a sample. The population is the entire collection of individual members of a group of interest. When the entire population can’t be measured, we make use of a sample. A sample is a subset of a population that is drawn to enable inferences to the population. A sample should be representative of the population, meaning that the sample’s characteristics are similar to the population’s characteristics. A representative sample should be an unbiased indication of what the population is like.

Four EU countries have a compulsory voting law for the European Parliament elections; Belgium, Luxembourg, Greece and Cyprus. Theoretically the votes in these countries would reflect the entire population. While Belgium and Luxembourg show a turnout of above 85%, the figure is still low in the two other countries (57% in Greece and 42% in Cyprus).

Will the eighth legislature of the European Parliament be representative for the entire population? This is a question that we can’t answer, however statistics allow us to calculate if there is an association between the turnout in the different member states and the outcomes of the elections.

A commonly used test that examines whether there is an association between two categorical variables is the Pearson chi-square test. According to this method an association exists between two categorical variables if the distribution of one variable changes when the value of the other variable changes. If there is no association, the distribution of the first variable is the same regardless of the level of the other variable. In other words, the chi-square test measures the difference between the observed cell frequencies and the cell frequencies that are expected if there is no association between the variables. If you have a significant chi-square statistic, there is strong evidence that an association exists between your variables.

In order to perform the Pearson chi-square test I combined the turnout data with the winning European party per state and I created a variable ‘Turnout category’.

Country Winning party Turnout Turnout category
Slovakia EPP 13.0% 0-30%
Czech Republic EPP 19.5% 0-30%
Poland EPP 23.8% 0-30%
Croatia EPP 24.3% 0-30%
Slovenia EPP 24.5% 0-30%
Hungary EPP 29.2% 0-30%
Latvia EPP 30.0% 0-30%
Portugal S&D 34.5% 30%-40%
Romania S&D 34.7% 30%-40%
UK EFD 36.0% 30%-40%
Estonia ALDE 36.4% 30%-40%
Netherlands ALDE 37.0% 30%-40%
Bulgaria EPP 40.2% 40%-50%
Finland ALDE 40.9% 40%-50%
Cyprus EPP 42.4% 40%-50%
France NI 42.4% 40%-50%
Spain EPP 44.7% 40%-50%
Austria EPP 45.0% 40%-50%
Lithuania ALDE 47.3% 40%-50%
Germany EPP 47.9% 40%-50%
Sweden S&D 51.0% 50%-60%
Ireland EPP 51.2% 50%-60%
Denmark EFD 55.0% 50%-60%
Italy S&D 57.2% 50%-60%
Greece GUE/NGL 57.4% 50%-60%
Malta EPP 74.8% 60%-100%
Luxembourg EPP 85.5% 60%-100%
Belgium G/EFA 90.0% 60%-100%

Next step was to turn this table into a crosstab. Therefore I used the Table Analysis task in SAS Enterprise Guide. Within the Table Analysis task the Chi-Square tests can be calculated.


If we look at the rows of the crosstab below we see that European People’s Party (EPP) is the most popular party in 15 of the 28 European countries. If we look at the columns of the crosstab we conclude that in all 7 countries with a turnout between 0 and 30% European People’s Party was elected. This was not the case in any of the 5 countries with a turnout between 30% and 40%. This is an illustration of the fact that there is an association between the turnout and the winning party.This conclusion is confirmed by the Pearson chi-square as the outcome of the test was significant (χ2=38.09 and Exact P=0.013).


Based on this we could presume that a right party benefits from the lowest turnout but keep in mind that an association does not necessarily imply causation!

Post a Comment

Will California dry up and fall off?

When the recent earthquake hit California, one of my friends commented that "California is so dry right now, I'm surprised it didn't just break off and fall into the ocean!"

It got me wondering, just how bad and widespread is California's drought? I did a few Web searches and found lots of dramatic before-and-after pictures of their lakes, which are down to about 35% of capacity statewide. Some of the pictures made me think of Southern Peru, and how huge bays and bodies of water there had turned into deserts (millions of years ago), leaving behind fossils of whales which I got to see a few years ago. Here's a picture I took of my buddy Joe taking a picture of one of those whale fossils...


Of course, we're all glad California didn't fall into the ocean, and I'm sure they'll make a comeback from this drought ... but it got me wondering what data is available, and how I might visually analyze it.

I found that the site provides shapefiles containing precipitation data. I downloaded the current data for the last 60 days, showing the rainfall in each area of the US as a percent of the average rainfall for that time period. I used Proc Mapimport to get the data into SAS, and found that it was point data (rather than polygons), so I couldn't use Proc Gmap directly on the data. But I was able to use annotate to color-code the points and place them on a blank Gmap. The points are so dense that they have no space between them, and produce what looks like a contour map.

And here is the map! It definitely shows that most of California has had dramatically less rainfall during the past 60 days than the historical average. Here's a link to the code, if you want to try creating your own maps.



Post a Comment

How to show recessions (or other ranges) on a time series plot

It's easy to plot events that happened at a certain time, but what about events that extended over a range of dates, such as recessions? ... This blog post teaches you a nice trick to use for that!

Let's say you have a plot of the labor force participation rate over time, and you want to also see when the recessions started and ended. Here's the way I would plot it:


How do you do that in SAS? If you've seen any of my previous blogs, you probably know that the trick I use to add things to graphs is almost always annotate. Annotate gives you a way to programmatically 'draw' extra things on a graph. And in this case, I use it to draw a colored box behind the graph, extending in the left/right direction from the start date to the end date of each recession, and extending in the up/down direction from the bottom of the axis to the top.

Here's a short example that shows how to code it. Notice that I use the data coordinate system for the x direction (xsys='2') so I can specify the dates, and I use a %-based coordinate system for the y direction (ysys='1') so I can simply specify the values 0 and 100%. The when='b' draws the solid-colored bar before (ie, behind) so it doesn't obscure the plot lines.

data anno_recessions;
xsys='2'; ysys='1'; when='b';
x='15apr1960'd; y=0; function='move'; output;
x='15feb1961'd; y=100; function='bar'; color='cxf0f1e6'; style='solid'; output;

What about events other than recessions? - Certainly! It could be any event with a start date and end date. And if it's not obvious what the event is, just stick a label on it!


Or what about which president was in office? Yep - you can use the same trick to show that!


What about "graying out" a special region of a Salmonella chart? Sure! With a slight modification, you can use an alpha-transparent gray color, and annotate it after the graph (when='a' so it is drawn on top).


What about showing a season of the year, such as 'winter'? ... You betcha!


You can even use this technique to simply alternate a background color (like the old green bar paper) instead of using reference lines.



I've set up each of the graphs above so you can click on them to see the 'info' page which has a link to the SAS code used to create the graph. Feel free to download the code, and modify it to work with your data.

Now it's your turn - do you have any tricks that you'd like to share for enhancing time series plots? If so, leave a comment below!

Post a Comment

Mapping 206 hospitals in 29 states!

If you're a really big company, with many locations around the country, how do you keep track of all that? ... With a great map, of course!

I recently read a CNN article about the Community Health Systems network being hacked - exposing the names, Social Security numbers, physical addresses, birthdays and telephone numbers of 4.5 million patients. That's mind boggling, isn't it?!?

I'm always looking for different ways to use SAS, and I thought maybe we could use it to plot the locations of their 206 hospitals on a map, so people could easily see if their hospital is one of those impacted by the hackers.

The article included a map, but it was small and didn't really provide enough detail to be very useful. My new/improved SAS version is below - you can click it to see the full size map, and it will let you hover your mouse over each marker to see the hospital name. I also included the full text list of all the hospitals (below the map), in case you want to search by city name, etc.


That's a nice looking SAS map, eh?!? You're probably wondering "How'd he do that?!?" ...

Well, I started with a list of all the hospitals, and the city & state where each was located. I used Proc Geocode to find the latitude & longitude of the cities, and then plotted a marker at each of those locations using annotate. I used Proc SQL to count the number of hospitals and states, and saved the results into macro variables which I also annotated on the map. ODS HTML allowed me embed html tags so that each marker has hover-text with the name of the hospitals. And, most importantly, it's all data driven (no hard-coding of marker locations or numbers) - therefore to change what I'm plotting on the map, all I have to do is change the data!

Have you got some data that you might want to plot on a map like this? Here's a link to the SAS code, in case you want to try your hand at re-using it. Feel free to leave a comment about this map, or a map you might create with your own data!


Post a Comment

What's trending in your neck of the woods?

"Trending" is one of those words that's trending these days, as we verbify everything. And if you understood that, then you'll probably love this blog! :-)

For those of you not hip to the lingo ('hip' meaning 'knowing about and following the newest styles, fashions, etc.'), defines trending as "to emerge as a popular trend; be currently popular."

I recently ran across an article describing a new map that shows which YouTube videos are trending in various parts of the US. The map interface allows you to select things like gender and age, to see what videos are most popular (or trending) with that demographic. It's really cool - give it a try! ... but please come back to read the rest of the blog, so you can learn how to create such maps in SAS!


 Like I do with most cool maps & graphs, I decided to come up with a way to create a similar map using SAS! I don't have access to the YouTube viewership data, so I took the liberty of creating some 'plausibly real' data showing what SAS-related YouTube videos are trending in various cities. And whereas the original map appears to use pre-determined cities (arranged in neat rows and columns by hand), I will set mine up so the city locations are data-driven.

I used Proc Geocode to determine the latitude and longitude of each city (note that if you had IP addresses of the viewers rather than city, Proc Geocode can also be used on IP addresses!) I then created an annotate dataset encoding commands to move to each lat/long coordinate, and then switch to a relative-percent coordinate system and then move a certain percentage down and to the left of the center coordinate, and then draw an image icon that extends the same percentage distance up and to the right of the center coordinate. Here's a link to the actual code.

And here's my SAS version of the map! Note that the data is fabricated (not real), but the videos that the icons link to are real.


Now that you know how to create such a map in SAS, what data might you have that would be interesting to plot in this way, with little image/icons? Share your ideas in the comments section!


Post a Comment