When did 'your music' become 'classic rock'

In this blog post, I put some classic rock song data under the SAS Analytics microscope, to see if I could get a better picture of exactly what is considered 'classic rock' these days...

Michael Raithel recently pointed me to an interesting article/study about 'classic rock' music, and invited (or is that challenged?) me to see what I could do with this data using SAS graphics. Being a graph guy *and* a DJ, how could I turn down such an opportunity!?!

Here's a picture of my DJ setup. I've played quite a bit of classic rock, so hopefully I qualify as a subject matter expert (SME) in this area, LOL!


The first question that popped into my head was "where are these 25 radio stations located?" I used Proc Geocode to determine the latitude/longitude centroid of each city, and plotted them on a map. It looks like the stations are pretty well spread out across the US, but not too many from what I consider "the deep south" - therefore the results might not have as much 'southern rock' as I would have liked. While I was creating this map, I decided to add html hover-text, so you can see the "top 10" most frequently played songs for each station (click the snapshot image below to see the interactive map with the hover-text):

Classic Rock Stations map

Since the data had a timestamp of when the songs were played, I thought it might be interesting to see if certain songs were played at certain times, etc. But after plotting the data on a timeline, I found that the timestamps were not consistent enough for such a study. Some stations had the song timestamps down to the hour or minute, while others appeared to just have a daily summery (one timestamp per day).


The original article had a nice histogram, showing the distribution of the songs by their release year. I decided to create a similar histogram, but in mine the height of the bars represent the frequency of how many times the songs were played, and I show visible dividers between each song (so you can 'see' which songs were played more than others), and I add html hover text so you can see the names of the songs (click the snapshot image below to see the interactive graph with the hover-text).


And for my final visualization, I decided to come up with a totally different chart (not in the original article). I calculated what were the 20 most-played artists overall, and then created a bar chart showing how often (% of time) each of those artists was played at each of the 25 stations. I wanted to see if a small number of artists was played a 'majority' of the time (which is what it seems like, when I listen to classic rock stations). And, sure enough, one of the stations actually did play the top 20 artists over 50% of the time! Click the snapshot image below to see the interactive graph with the hover-text and drilldown links (be sure to try the drilldowns - on the bar segments, and the bar labels!)


 Did you 'discover' anything interesting in these graphs? What's your favorite "classic rock" song?

Post a Comment

I see spots ... sunspots!

The sun has gone eerily quiet, in the middle of what should be the height of the 11-year sunspot cycle...

Here's a superb photo of some sunspots that Stephen A. Carr posted to the Telescope Addicts Facebook group - a group which I follow with great interest. (Thanks for allowing me to use the photo Stephen!)

Stephen A Carr's sunspot photo

But Stephen would not have been able to take such a picture yesterday ... A physics.org article pointed out that NASA's Solar Dynamics Observatory had just recorded an All Quiet Event (no sunspot activity). And being in the middle of what should be the height of the 11-year sunspot cycle (ie, the solar maximum), that does seem a bit odd. It also seems like something I could see  better with some SAS graphs!

When it comes to visualizing the 11-year sunspot cycle, the graph shown on Wikipedia is probably the most famous. But the data in that graph only goes a few years past year 2000 (up through the previous 11-year cycle's minimum). I wanted to see the current maximum, in context of all the other data on the graph.

I located a source for the monthly mean sunspot numbers, and set up some SAS code to import them directly from the Web. I then transposes it so that the monthly columns became one long series that I could plot across a time axis (red and blue portions of graph). I used Proc Expand to calculate the moving average and overlaid it on the same plot (black line). And with the addition of a few annotated labels, I now have a graph almost exactly like the original, but also including the latest/greatest monthly sunspot numbers :)

Graph of 400 Years of Sunspot Observations

Plotting this much data on 1 screen makes it a little difficult to see exactly what the most recent cycle is doing. Therefore I created an additional plot, just showing the most recent ~11-year cycle (plotted against the same y-axis scale). It does indeed show a very wimpy number of sunspots, and also seems to indicate that we have perhaps crossed the peak of the cycle, and are on our way back down.

Most recent sunspot cycle

So, are any of my blog readers astronomers? What are your theories or observations on sunspots and solar cycles?

Post a Comment

Hot hot heat map

Although I’m not particularly excited about football (I admit, I don’t completely understand what offside means), I did follow the last World Cup with more than average attention. Not only for the handsome players, but especially for all the fascinating statistics that appeared. It struck me that heat maps popped up everywhere: on Twitter, in newspaper articles, in talk shows, … so I tried to find out why heat maps have become so popular.

What are the origins of a heat map?

heatmap1A heat map is any data visualization which uses color to represent data values in a two-dimensional image.

The term "Heat map" was originally coined and trademarked by software designer Cormac Kinney in 1991, to describe a 2D display depicting real time financial market  information. Heat maps actually originated already in the seventies as 2D displays of the values in a data matrix. Larger values were represented by small dark gray or black  squares (pixels) and smaller values by lighter squares. Nowadays we’re no longer stuck to those fifty shades of grey as there are many different color schemes that can be used to illustrate the heat map.

Heat maps have gained importance in the new era of big data. While in the past scatter plots were used on smaller datasets to discover trends and outliers that remain hidden on traditional charts and spreadsheets, heat maps have taken over. By applying color, numerous observations can be visualized together as the color is indicating the frequency of the pattern.

Different heat maps for different purposes

There are many different types of heat maps used in different disciplines, each referred to by the term “heat map”, even though they use different visualization techniques.

In PC gaming, heat maps can be used for testing game maps, to measure the feasibility of a map, which areas are being used more by the players, are there areas which are being overused or underused. This will help the developers to create a spatially optimized map.

In Web analytics, heat maps are used to see where the users of the websites are actually pointing their cursors and which part of the website are they spending most of their time.

In Biology, heat maps are typically used in molecular biology to represent the level of expression of many genes across a number of comparable samples (e.g. cells in different states, samples from different patients) as they are obtained from DNA microarrays.

Heat maps in football or any other sport are used to identify the frequency of events spread in a given particular area. Specifically for football, heat maps are an indicator of effectiveness of a player in different parts of the pitch.

Heat maps of the World Cup

A very appealing heat map from last World Cup, was the one of the match between Germany and Brazil. The visualization can help in explaining the 7-1 result. This view visualizes the on-ball actions of the teams. The color indicates the volume of events, with few events being highlighted by blue, and lots of events by red.

The map gets heated up in areas where the player has had more control of the ball and does most of his work, it turns redder as the player's presence in a particular area increases.


What can business users learn from this?

Heat maps can be produced very easily with SAS Visual Analytics. SAS customer Orlando Magic, an NBA team, is using heat maps for price and packaging planning of their ticket sales. Heat maps are also very popular in the banking sector. According to Michael Bryan, CIO of Bank of North Carolina, senior executives and board members particularly like heat maps when analyzing loan data. "You can draw 300 fields, and the heat map can tell you in different colors what components of that data are strong and which are weak," he says. "If you wanted to look at a pool of delinquent loans, you could see what they have in common."

If you want to learn more about heat maps, have a look at these posts by SAS colleague Rick Wicklin, statistical graph expert.

Visualize a matrix in SAS by using a discrete heat map

A Christmas tree matrix

Post a Comment

Use SAS to help plan your next vacation!

SAS is great at helping make important business decisions - how about helping decide where to take your next vacation?...

Here's a picture from one of my favorite vacations with my buddy Joe. As you can see, I like "nature vacations." Can you guess where this one was? (leave a comment with your best guess!)


I was looking around for potential vacation spots to visit this summer, and came across a cool article on the CNN website that listed one great natural wonder in each of the 50 US states. The article was laid out on a single page, and it took a lot of scrolling to find the states I was interested in. So I thought to myself, "Self - why don't you plot this data on a SAS map, and set up drilldowns in the map that jump to the desired state in the article?"

And that's exactly what I did! Click the thumbnail below to see the full-size interactive SAS map. Hover over the markers to see summary info, and click the markers (or the states) to jump to that state's section of the CNN article. It's a nice example of visual (and geographical) analytics, to show the power of SAS!



Post a Comment

Getting things in order with PROC SQL, macro and a SAS function

Order must be the most frequent cry for help in the SAS classroom. “HELP,” said my student in the classroom. “I work with messy health data. My users want to see data in this order.”

T1.col1, t1.col2, t1.col3, t1.col4, t2.col5, t1.col6 and list the remaining columns in column position from table 1.

My 2 tables, t1 and t2, have hundreds of columns. They have a common column for me to join the tables on and I really don’t want to spend manually typing each and every column name in the SELECT statement. That’s way too much time on manual work, time that could be spent doing more productive work. Can you help make this process dynamic?

How was I going to help this customer?  I knew I could rely on PROC SQL dictionary tables to grab metadata, in this case the column names without hardcoding them. I also knew that the column names that dictionary tables return could be stored in a macro. Making it easy for me to write a SELECT.

Given that I only had the problem but no data to work with, I turned to the SASHELP library (PDF). It’s a great repository of over 200 sample datasets. I went ahead and submitted this code to query the dictionary tables called columns to find out which datasets I could use to answer the customer question. I wanted datasets that had the same name and type to perform the join.

*Step 1 : locating columns that have the same name & type to perform the equijoin;
proc sql;
select name, memname, type, length
from dictionary.columns
where libname ='SASHELP'
group by name
having count(name) > 1 AND COUNT(DISTINCT TYPE) =1
order by name;

date columns

I started writing code to store the column names from the 2 tables in the desired order in macro variables:
Read More »

Post a Comment

How to turn rows into columns in SAS University Edition

What happens when your data isn't arranged in rows & columns the way you need it? ... Well if you haven't guessed by now - SAS Software is the king of restructuring data!

Sunspot and solar flare activity are cyclical, and peak about every 11 years. Currently we are near the high point of the cycle, as shown in this SAS graph I created:


Would you like to know how to download the data, and create your own plot? ... I knew you would! :)

This example is a little more complex than the previous ones, but it teaches some really useful techniques you'll use over and over in your data munging career. Just take it one step at a time, and you should be fine! :)

First you'll need to download the raw data file table_international-sunspot-numbers_monthly.txt from the noaa.gov web site, and save it into C:\\SASUniversityEdition\\myfolders\\, so that you can get to it via the path /folders/myfolders/ in your SAS program. (See my previous blog post with details about how to set up this shared folder, if you haven't already done so.) While you're saving it, I recommend shortening the name to sunspots.txt.

The text file is structured with each year on a single line, and a column for each month's sunspot frequency count:


 You can import the text file into SAS by running the following code:

data sunspots_raw;
infile "/folders/myfolders/sunspots.txt" dlm=' ' firstobs=5;
input year jan feb mar apr may jun jul aug sep oct nov dec;
if (year ge 1749) then output;

You've now got the data in a SAS data set, but each month is in a separate column (or variable). You could plot 12 overlapping lines against year, but we really want them all together as 1 long time-series line. So let's transpose the data, and make the months become values, rather than variables.

proc transpose data=sunspots_raw out=sunspots_tran;
by year;

Here's what your SAS dataset looks like now - we're almost there!


One last little detail - we need to combine the year & month into a single value. I chose to use an actual date value at the 15th of each month. To do this, I put together a string value on-the-fly (such as '15jan1749') and input that string using the date9. SAS format.

data sunspots_tran; set sunspots_tran;
format date date9.;

Your data now has an official date column, and looks like this:


Your data is now ready, and all you have to do is run the following code to plot it:

proc sgplot data=sunspots_tran;
label date='Year' col1='Monthly Sunspots';
series x=date y=col1;


 What other data can you find that might need to be transposed in order to plot it? If you find some, try importing and transposing it, and leave a comment letting everyone know how it turns out!


Post a Comment

Live long and prosper ... but how long?

I bet a lot of SAS users know where the phrase "Live Long and Prosper" comes from ... but can we use SAS software to also help analyze 'how long'? This blog posting shows several different kinds of SAS graphs that can help answer such a question!

Let's start with a map. I recently found an interesting map in an article about overeating. The map showed changes in life expectancy, by geographical region. I created my own version of the map, with a few changes (hopefully improvements!).

In the original map, I saw a lot of comments complaining about the 'regional/area names' that were used, therefore I decided to just leave those names out (and let the coloring of the grouped areas suffice). I also used simpler (just red & green, for bad & good) colors for the selected countries with detailed labels. My version also has mouse-over text for the country names. Click the snapshot below to see my full-size map, with hover-text:


I found another visualization in a New York Times article that used a grid of bubbles to represent life expectancy data. After we added support for alpha-transparent colors in SAS, I was able to create my own version:


And last but not least, what about a bar chart? ...


As you can see, there are many different ways of visually analyzing the same (or in this case similar) data. Each different type of chart, and even subtle variations of the same chart, can help give you more/different insight into your data, and thereby help you understand it better. When you're exploring your data, I encourage you to visualize it in several different ways. And when you want to answer a specific question, select the type of chart that best answers the question.

Post a Comment

Where will the next earthquake strike?

This morning, a fairly strong earthquake struck southern Mexico - can SAS analyze past earthquake data to help predict future earthquakes?...

I don't have any pictures of earthquake damage, but here's one I took of the main house/prop from the Snow Camp Outdoor Theatre here in North Carolina - I imagine this might be what a lot of rural houses could look like after a strong earthquake!


As usual, the example in this blog post starts with a quest for data - earthquake data, to be specific. Luckily the NOAA's National Geophysical Data Center maintains quite an extensive database of earthquakes around the world. It not only lists the current ones, but even contains estimated data for earthquakes going back several thousand years.

I downloaded the entire database, saved it in a tab-delimited file, and then imported it into SAS using the following code:

PROC IMPORT OUT=quake_data DATAFILE="worldquakes.txt" DBMS=TAB REPLACE;

I then annotated markers onto a map at the lat/long locations where earthquakes have occurred, and used the size and color of the markers to represent the magnitude of the earthquakes. Here's the map for all earthquakes since 1975 (click the map to see it in full size):


And here's the map of all the earthquakes in the entire database, going back about 4000 years. Basically, the same regions that have been getting earthquakes the past ~40 years, are the same ones that have been getting earthquakes the past ~4000 years.

worldquakesSo, based strictly on a visual analysis, you can look at this map, and quickly tell where earthquakes are most likely to occur.

And here's one of my own earthquake theories ... in 2008 China filled the reservoir for the huge Three Gorges Dam. I notice in 2008 there were also a lot of small earthquakes in that area of China (see all the overlapping blue circles in the map below). I think the weight of all that water caused some of the land to 'settle', setting off a few small earthquake tremors. Does anyone know whether or not my dam theory holds water? ;)


Now it's your turn - what theories do you have about predicting earthquakes? Feel free to share your theory in a comment!

Post a Comment

Forecasting sharknadoes with SAS

Everyone is intrigued by natural disasters - can we predict the likelihood of multiple disasters happening simultaneously?...

My friend Rochelle is an avid SCUBA diver - here's a photo of a shark she got to 'smile for a picture' on a recent dive trip:


Rochelle was asking if I could use SAS to forecast the probability that hurricane Arthur would cause her upcoming dive trip off the North Carolina coast to be canceled. I'm not enough of a domain expert to forecast specific weather events, but I thought it would be interesting to look at the historical hurricane trends, and see what I could come up with...

Here in the US, we're right at the beginning of the hurricane season, and just finishing up the tornado season. Which got me wondering - is it possible that we could have both a hurricane and a tornado at the same time? And in the spirit of the 'B' movie Sharknado, I decided to call such combinations of multiple natural disasters "sharknadoes".

I first took the data for Atlantic hurricanes, and created a histogram showing the % of hurricanes which occurred during each month:


Next I took the data for US tornadoes, and created a similar histogram showing the % of tornadoes which occurred during each month:


And then, to determine when a hurricane/tornado sharknado might occur, I plotted both sets of data together using Proc Gplot with the needle interpolation and transparent colors. The tornadoes are red, the hurricanes are blue, and the time when they might both occur is purple.  Looks like August is the month with the highest probability of a hurricane/tornado sharknado! :)


So, have you ever experienced a natural disaster "sharknado"? What combination of natural disasters did you experience at the same time? (or close enough together that you were still recovering from the first, when the second hit)

Post a Comment

The driving force behind advanced analytics R&D

There are many factors that go into creating the next advanced analytics products at SAS, but Radhika Kulkarni, Vice President of Advanced Analytics told me there’s one main driving force.

I caught up with her at the Analytics 2014 conference to talk more about advanced analytics and what products you can expect to see next.


I also talked to Ken Sanford of SAS about why today’s analyst needs econometrics and the latest changes to the SAS/ETS portfolio.


You can watch more of my Inside Analytics videos from the conference on YouTube. Or see what it’s all about in person. Our next conference is set for Oct. 20-21 at the Bellagio in Las Vegas.

Post a Comment