Thanks a Million

In July, we trained our 1 millionth user. It’s a significant milestone for SAS.

In celebration, we want to recognize our customers and say “thank you” for making a commitment to us for your learning.

Now through Aug. 31, we are offering a special Buy One, Get One 50-percent off promotion for public classroom or Live Web courses.

To receive the discount, you must register for both courses by Aug. 31. You can choose from any of the hundreds of courses we offer between now and the end of the year.

When registering online, include both courses in the same shopping cart and type MILLION into the promo box for both courses. The 50-percent discount will be applied to the lower-priced course.

You may also register and receive the discount by phone at 1-800-333-7660. Just mention the promotion when registering.

Visit our special “Thanks a million!” web page for all of the details and instructions on the promotion.

It’s a rewarding time for us here at SAS Training, and we take great pride in the fact that our customers consistently rank our training as excellent. But we’re not resting on our laurels. We remain committed to helping you learn SAS by offering a number of programs and services designed to get you the training you need, when you need it.

Thank you again for continuing your SAS education with us. We hope to train a million more.

Post a Comment

There’s no ‘I’ in analytics

A few years ago I discussed the idea of analytic resources as ‘all-stars’ rather than ‘rock stars.’

While this previous blog certainly touched on the team aspect of analytic work, recently I’ve been thinking about just how much teamwork is required to make an analytics project successful.

From extracting data, transforming data, loading data (or loading then transforming for the ELT crowd), analyzing data, examining the results, sharing results, taking action up on those results and feeding those results back into data - a large network of individuals working as a team is required for a project to be a success.

Similar to a baseball team – and the motto "there is no ‘I’ in team" - if the teammates on an analytics project don’t work together the results will most likely be disappointing. Wanting to work as a team doesn’t always guarantees success. I’m sure you’ve seen your favorite athlete drop a ball or make a bad pass. But the willingness to be open and accept the idea that you are the member of a team on an analytic project will go a long way toward success.

The data experts need the input of the analytic experts, just as the analytic experts need the input and feedback from business. Let’s follow the chain – if the data person doesn’t know or understand the data requirements the analytic resources might be left with good old ‘garbage in, garbage out.’ If the analytic resources don’t understand the business needs, they may get data in great shape, come up with the most excellent of models that don’t tell the business *anything* and the results are never acted upon. Dollars are wasted by the business. Similar to a baseball team whose roster may include the highest paid, most fit and strongest athletes, but if they don’t work well together, they will not achieve successful results.

If you would like to learn more about building your analytics team and analytic teamwork, you can attend the Analytics 2014 conference in Las Vegas Oct. 20-21. Many speakers, including myself, will be presenting on ways to maximize your analytics talent. Also, pre-conference training is offered on October 19, and post conference training October 22-24, that will help your analytic teams excel!

Post a Comment

SQL Joins in SAS University Edition

Probably the most important thing you can learn in the free SAS University Edition is how to work with data. And one of the most powerful tools for working with data is Proc SQL ...

I've used Proc SQL in some of my previous blog posts for simple tasks (such as subsetting data), but this time we'll go a bit deeper and use it for something a bit more powerful - joining tables.

It is often the case that we maintain a data table with all the information about people (students, employees, customers, etc), and then in our daily transaction data we simply refer to them by some id number. That way we only have to maintain one copy of the people-data (name, address, age, gender, etc), we don't have to enter the same data multiple times (just the id number), and we don't have to store all that information for each transaction (only the id number).

In this example, I'm keeping the data very short and simple. We'll have a school class with 5 students, and the only data for each student is their name. Copy-n-paste the code into SAS University Edition and run it:

data students;
input idnum $ 1-5 name $ 7-50;
datalines;
id001 John Doe
id002 Jane Doe
id003 Raj Patel
id004 Tran Park
id005 Jet Lee
;
run;

Do you remember taking tests on Opscan sheets, with #2 pencils? For no particular reason, here's a visualization of one I created with SAS/Graph. This has nothing to do with the example, and is just here to jazz-things-up with a bit of color :)

test_cap

Now, let's assume we have a table of grades. For this very simple example, we'll say the students have only had one test so far. Notice in this table we only store the student id number (not the full name).

data grades;
input idnum $ 1-5 test1;
datalines;
id001 88
id002 95
id003 93
id004 99
id005 95
;
run;

If we want to see a bar chart of the grades, we can use the following simple code ... but it is difficult to tell which student is which, with only the student id numbers labeling each bar:

proc sgplot data=grades;
hbar idnum / response=test1;
run;

idnum

And this is where the SQL join comes into play... You can use the following code to add the student name to the grades table. And while we're at it, let's order the data by the test1 score, so we can have the bars in ascending order:

proc sql;
create table plotdata as
select unique grades.*, students.name
from grades left join students
on grades.idnum=students.idnum
order by test1;
quit; run;

Now when we plot the data, we can label each bar with the student name, and order the bars by the data-order:

proc sgplot data=plotdata;
hbar name / response=test1;
yaxis discreteorder=data;
run;

test_name

Remember - this is a simplified example, just to demonstrate the technique of SQL joins. Now, use your imagination and come up with ways to apply this technique to other data you might have, and you will soon become a highly paid SQL expert! :)

Post a Comment

When did 'your music' become 'classic rock'

In this blog post, I put some classic rock song data under the SAS Analytics microscope, to see if I could get a better picture of exactly what is considered 'classic rock' these days...

Michael Raithel recently pointed me to an interesting article/study about 'classic rock' music, and invited (or is that challenged?) me to see what I could do with this data using SAS graphics. Being a graph guy *and* a DJ, how could I turn down such an opportunity!?!

Here's a picture of my DJ setup. I've played quite a bit of classic rock, so hopefully I qualify as a subject matter expert (SME) in this area, LOL!

dj_setup

The first question that popped into my head was "where are these 25 radio stations located?" I used Proc Geocode to determine the latitude/longitude centroid of each city, and plotted them on a map. It looks like the stations are pretty well spread out across the US, but not too many from what I consider "the deep south" - therefore the results might not have as much 'southern rock' as I would have liked. While I was creating this map, I decided to add html hover-text, so you can see the "top 10" most frequently played songs for each station (click the snapshot image below to see the interactive map with the hover-text):

Classic Rock Stations map

Since the data had a timestamp of when the songs were played, I thought it might be interesting to see if certain songs were played at certain times, etc. But after plotting the data on a timeline, I found that the timestamps were not consistent enough for such a study. Some stations had the song timestamps down to the hour or minute, while others appeared to just have a daily summery (one timestamp per day).

classic_rock_playtime

The original article had a nice histogram, showing the distribution of the songs by their release year. I decided to create a similar histogram, but in mine the height of the bars represent the frequency of how many times the songs were played, and I show visible dividers between each song (so you can 'see' which songs were played more than others), and I add html hover text so you can see the names of the songs (click the snapshot image below to see the interactive graph with the hover-text).

classic_rock_release_year

And for my final visualization, I decided to come up with a totally different chart (not in the original article). I calculated what were the 20 most-played artists overall, and then created a bar chart showing how often (% of time) each of those artists was played at each of the 25 stations. I wanted to see if a small number of artists was played a 'majority' of the time (which is what it seems like, when I listen to classic rock stations). And, sure enough, one of the stations actually did play the top 20 artists over 50% of the time! Click the snapshot image below to see the interactive graph with the hover-text and drilldown links (be sure to try the drilldowns - on the bar segments, and the bar labels!)

classic_rock_music

 Did you 'discover' anything interesting in these graphs? What's your favorite "classic rock" song?

Post a Comment

I see spots ... sunspots!

The sun has gone eerily quiet, in the middle of what should be the height of the 11-year sunspot cycle...

Here's a superb photo of some sunspots that Stephen A. Carr posted to the Telescope Addicts Facebook group - a group which I follow with great interest. (Thanks for allowing me to use the photo Stephen!)

Stephen A Carr's sunspot photo

But Stephen would not have been able to take such a picture yesterday ... A physics.org article pointed out that NASA's Solar Dynamics Observatory had just recorded an All Quiet Event (no sunspot activity). And being in the middle of what should be the height of the 11-year sunspot cycle (ie, the solar maximum), that does seem a bit odd. It also seems like something I could see  better with some SAS graphs!

When it comes to visualizing the 11-year sunspot cycle, the graph shown on Wikipedia is probably the most famous. But the data in that graph only goes a few years past year 2000 (up through the previous 11-year cycle's minimum). I wanted to see the current maximum, in context of all the other data on the graph.

I located a source for the monthly mean sunspot numbers, and set up some SAS code to import them directly from the Web. I then transposes it so that the monthly columns became one long series that I could plot across a time axis (red and blue portions of graph). I used Proc Expand to calculate the moving average and overlaid it on the same plot (black line). And with the addition of a few annotated labels, I now have a graph almost exactly like the original, but also including the latest/greatest monthly sunspot numbers :)

Graph of 400 Years of Sunspot Observations

Plotting this much data on 1 screen makes it a little difficult to see exactly what the most recent cycle is doing. Therefore I created an additional plot, just showing the most recent ~11-year cycle (plotted against the same y-axis scale). It does indeed show a very wimpy number of sunspots, and also seems to indicate that we have perhaps crossed the peak of the cycle, and are on our way back down.

Most recent sunspot cycle

So, are any of my blog readers astronomers? What are your theories or observations on sunspots and solar cycles?

Post a Comment

Hot hot heat map

Although I’m not particularly excited about football (I admit, I don’t completely understand what offside means), I did follow the last World Cup with more than average attention. Not only for the handsome players, but especially for all the fascinating statistics that appeared. It struck me that heat maps popped up everywhere: on Twitter, in newspaper articles, in talk shows, … so I tried to find out why heat maps have become so popular.

What are the origins of a heat map?

heatmap1A heat map is any data visualization which uses color to represent data values in a two-dimensional image.

The term "Heat map" was originally coined and trademarked by software designer Cormac Kinney in 1991, to describe a 2D display depicting real time financial market  information. Heat maps actually originated already in the seventies as 2D displays of the values in a data matrix. Larger values were represented by small dark gray or black  squares (pixels) and smaller values by lighter squares. Nowadays we’re no longer stuck to those fifty shades of grey as there are many different color schemes that can be used to illustrate the heat map.

Heat maps have gained importance in the new era of big data. While in the past scatter plots were used on smaller datasets to discover trends and outliers that remain hidden on traditional charts and spreadsheets, heat maps have taken over. By applying color, numerous observations can be visualized together as the color is indicating the frequency of the pattern.

Different heat maps for different purposes

There are many different types of heat maps used in different disciplines, each referred to by the term “heat map”, even though they use different visualization techniques.

In PC gaming, heat maps can be used for testing game maps, to measure the feasibility of a map, which areas are being used more by the players, are there areas which are being overused or underused. This will help the developers to create a spatially optimized map.

In Web analytics, heat maps are used to see where the users of the websites are actually pointing their cursors and which part of the website are they spending most of their time.

In Biology, heat maps are typically used in molecular biology to represent the level of expression of many genes across a number of comparable samples (e.g. cells in different states, samples from different patients) as they are obtained from DNA microarrays.

Heat maps in football or any other sport are used to identify the frequency of events spread in a given particular area. Specifically for football, heat maps are an indicator of effectiveness of a player in different parts of the pitch.

Heat maps of the World Cup

A very appealing heat map from last World Cup, was the one of the match between Germany and Brazil. The visualization can help in explaining the 7-1 result. This view visualizes the on-ball actions of the teams. The color indicates the volume of events, with few events being highlighted by blue, and lots of events by red.

The map gets heated up in areas where the player has had more control of the ball and does most of his work, it turns redder as the player's presence in a particular area increases.

heatmap2

What can business users learn from this?

Heat maps can be produced very easily with SAS Visual Analytics. SAS customer Orlando Magic, an NBA team, is using heat maps for price and packaging planning of their ticket sales. Heat maps are also very popular in the banking sector. According to Michael Bryan, CIO of Bank of North Carolina, senior executives and board members particularly like heat maps when analyzing loan data. "You can draw 300 fields, and the heat map can tell you in different colors what components of that data are strong and which are weak," he says. "If you wanted to look at a pool of delinquent loans, you could see what they have in common."

If you want to learn more about heat maps, have a look at these posts by SAS colleague Rick Wicklin, statistical graph expert.

Visualize a matrix in SAS by using a discrete heat map

A Christmas tree matrix

Post a Comment

Use SAS to help plan your next vacation!

SAS is great at helping make important business decisions - how about helping decide where to take your next vacation?...

Here's a picture from one of my favorite vacations with my buddy Joe. As you can see, I like "nature vacations." Can you guess where this one was? (leave a comment with your best guess!)

dunes

I was looking around for potential vacation spots to visit this summer, and came across a cool article on the CNN website that listed one great natural wonder in each of the 50 US states. The article was laid out on a single page, and it took a lot of scrolling to find the states I was interested in. So I thought to myself, "Self - why don't you plot this data on a SAS map, and set up drilldowns in the map that jump to the desired state in the article?"

And that's exactly what I did! Click the thumbnail below to see the full-size interactive SAS map. Hover over the markers to see summary info, and click the markers (or the states) to jump to that state's section of the CNN article. It's a nice example of visual (and geographical) analytics, to show the power of SAS!

us_natural_wonders

 

Post a Comment

Getting things in order with PROC SQL, macro and a SAS function

Order must be the most frequent cry for help in the SAS classroom. “HELP,” said my student in the classroom. “I work with messy health data. My users want to see data in this order.”

T1.col1, t1.col2, t1.col3, t1.col4, t2.col5, t1.col6 and list the remaining columns in column position from table 1.

My 2 tables, t1 and t2, have hundreds of columns. They have a common column for me to join the tables on and I really don’t want to spend manually typing each and every column name in the SELECT statement. That’s way too much time on manual work, time that could be spent doing more productive work. Can you help make this process dynamic?

How was I going to help this customer?  I knew I could rely on PROC SQL dictionary tables to grab metadata, in this case the column names without hardcoding them. I also knew that the column names that dictionary tables return could be stored in a macro. Making it easy for me to write a SELECT.

Given that I only had the problem but no data to work with, I turned to the SASHELP library (PDF). It’s a great repository of over 200 sample datasets. I went ahead and submitted this code to query the dictionary tables called columns to find out which datasets I could use to answer the customer question. I wanted datasets that had the same name and type to perform the join.

*Step 1 : locating columns that have the same name & type to perform the equijoin;
proc sql;
select name, memname, type, length
from dictionary.columns
where libname ='SASHELP'
group by name
having count(name) > 1 AND COUNT(DISTINCT TYPE) =1
order by name;
quit;

date columns

I started writing code to store the column names from the 2 tables in the desired order in macro variables:
Read More »

Post a Comment

How to turn rows into columns in SAS University Edition

What happens when your data isn't arranged in rows & columns the way you need it? ... Well if you haven't guessed by now - SAS Software is the king of restructuring data!

Sunspot and solar flare activity are cyclical, and peak about every 11 years. Currently we are near the high point of the cycle, as shown in this SAS graph I created:

sunspot

Would you like to know how to download the data, and create your own plot? ... I knew you would! :)

This example is a little more complex than the previous ones, but it teaches some really useful techniques you'll use over and over in your data munging career. Just take it one step at a time, and you should be fine! :)

First you'll need to download the raw data file table_international-sunspot-numbers_monthly.txt from the noaa.gov web site, and save it into C:\\SASUniversityEdition\\myfolders\\, so that you can get to it via the path /folders/myfolders/ in your SAS program. (See my previous blog post with details about how to set up this shared folder, if you haven't already done so.) While you're saving it, I recommend shortening the name to sunspots.txt.

The text file is structured with each year on a single line, and a column for each month's sunspot frequency count:

sunspot_data

 You can import the text file into SAS by running the following code:

data sunspots_raw;
infile "/folders/myfolders/sunspots.txt" dlm=' ' firstobs=5;
input year jan feb mar apr may jun jul aug sep oct nov dec;
if (year ge 1749) then output;
run;

You've now got the data in a SAS data set, but each month is in a separate column (or variable). You could plot 12 overlapping lines against year, but we really want them all together as 1 long time-series line. So let's transpose the data, and make the months become values, rather than variables.

proc transpose data=sunspots_raw out=sunspots_tran;
by year;
run;

Here's what your SAS dataset looks like now - we're almost there!

sunspot_tran

One last little detail - we need to combine the year & month into a single value. I chose to use an actual date value at the 15th of each month. To do this, I put together a string value on-the-fly (such as '15jan1749') and input that string using the date9. SAS format.

data sunspots_tran; set sunspots_tran;
format date date9.;
date=input('15'||trim(left(_name_))||trim(left(year)),date9.);
run;

Your data now has an official date column, and looks like this:

sunspot_date

Your data is now ready, and all you have to do is run the following code to plot it:

proc sgplot data=sunspots_tran;
label date='Year' col1='Monthly Sunspots';
series x=date y=col1;
run;

sunspot_plot

 What other data can you find that might need to be transposed in order to plot it? If you find some, try importing and transposing it, and leave a comment letting everyone know how it turns out!

 

Post a Comment

Live long and prosper ... but how long?

I bet a lot of SAS users know where the phrase "Live Long and Prosper" comes from ... but can we use SAS software to also help analyze 'how long'? This blog posting shows several different kinds of SAS graphs that can help answer such a question!

Let's start with a map. I recently found an interesting map in an article about overeating. The map showed changes in life expectancy, by geographical region. I created my own version of the map, with a few changes (hopefully improvements!).

In the original map, I saw a lot of comments complaining about the 'regional/area names' that were used, therefore I decided to just leave those names out (and let the coloring of the grouped areas suffice). I also used simpler (just red & green, for bad & good) colors for the selected countries with detailed labels. My version also has mouse-over text for the country names. Click the snapshot below to see my full-size map, with hover-text:

global_life_expectancy

I found another visualization in a New York Times article that used a grid of bubbles to represent life expectancy data. After we added support for alpha-transparent colors in SAS, I was able to create my own version:

death_odds

And last but not least, what about a bar chart? ...

life

As you can see, there are many different ways of visually analyzing the same (or in this case similar) data. Each different type of chart, and even subtle variations of the same chart, can help give you more/different insight into your data, and thereby help you understand it better. When you're exploring your data, I encourage you to visualize it in several different ways. And when you want to answer a specific question, select the type of chart that best answers the question.

Post a Comment