Where will the next earthquake strike?

This morning, a fairly strong earthquake struck southern Mexico - can SAS analyze past earthquake data to help predict future earthquakes?...

I don't have any pictures of earthquake damage, but here's one I took of the main house/prop from the Snow Camp Outdoor Theatre here in North Carolina - I imagine this might be what a lot of rural houses could look like after a strong earthquake!


As usual, the example in this blog post starts with a quest for data - earthquake data, to be specific. Luckily the NOAA's National Geophysical Data Center maintains quite an extensive database of earthquakes around the world. It not only lists the current ones, but even contains estimated data for earthquakes going back several thousand years.

I downloaded the entire database, saved it in a tab-delimited file, and then imported it into SAS using the following code:

PROC IMPORT OUT=quake_data DATAFILE="worldquakes.txt" DBMS=TAB REPLACE;

I then annotated markers onto a map at the lat/long locations where earthquakes have occurred, and used the size and color of the markers to represent the magnitude of the earthquakes. Here's the map for all earthquakes since 1975 (click the map to see it in full size):


And here's the map of all the earthquakes in the entire database, going back about 4000 years. Basically, the same regions that have been getting earthquakes the past ~40 years, are the same ones that have been getting earthquakes the past ~4000 years.

worldquakesSo, based strictly on a visual analysis, you can look at this map, and quickly tell where earthquakes are most likely to occur.

And here's one of my own earthquake theories ... in 2008 China filled the reservoir for the huge Three Gorges Dam. I notice in 2008 there were also a lot of small earthquakes in that area of China (see all the overlapping blue circles in the map below). I think the weight of all that water caused some of the land to 'settle', setting off a few small earthquake tremors. Does anyone know whether or not my dam theory holds water? ;)


Now it's your turn - what theories do you have about predicting earthquakes? Feel free to share your theory in a comment!

Post a Comment

Forecasting sharknadoes with SAS

Everyone is intrigued by natural disasters - can we predict the likelihood of multiple disasters happening simultaneously?...

My friend Rochelle is an avid SCUBA diver - here's a photo of a shark she got to 'smile for a picture' on a recent dive trip:


Rochelle was asking if I could use SAS to forecast the probability that hurricane Arthur would cause her upcoming dive trip off the North Carolina coast to be canceled. I'm not enough of a domain expert to forecast specific weather events, but I thought it would be interesting to look at the historical hurricane trends, and see what I could come up with...

Here in the US, we're right at the beginning of the hurricane season, and just finishing up the tornado season. Which got me wondering - is it possible that we could have both a hurricane and a tornado at the same time? And in the spirit of the 'B' movie Sharknado, I decided to call such combinations of multiple natural disasters "sharknadoes".

I first took the data for Atlantic hurricanes, and created a histogram showing the % of hurricanes which occurred during each month:


Next I took the data for US tornadoes, and created a similar histogram showing the % of tornadoes which occurred during each month:


And then, to determine when a hurricane/tornado sharknado might occur, I plotted both sets of data together using Proc Gplot with the needle interpolation and transparent colors. The tornadoes are red, the hurricanes are blue, and the time when they might both occur is purple.  Looks like August is the month with the highest probability of a hurricane/tornado sharknado! :)


So, have you ever experienced a natural disaster "sharknado"? What combination of natural disasters did you experience at the same time? (or close enough together that you were still recovering from the first, when the second hit)

Post a Comment

The driving force behind advanced analytics R&D

There are many factors that go into creating the next advanced analytics products at SAS, but Radhika Kulkarni, Vice President of Advanced Analytics told me there’s one main driving force.

I caught up with her at the Analytics 2014 conference to talk more about advanced analytics and what products you can expect to see next.


I also talked to Ken Sanford of SAS about why today’s analyst needs econometrics and the latest changes to the SAS/ETS portfolio.


You can watch more of my Inside Analytics videos from the conference on YouTube. Or see what it’s all about in person. Our next conference is set for Oct. 20-21 at the Bellagio in Las Vegas.

Post a Comment

Analytics, and winning soccer (fútbol) teams!

The old saying KISS (Keep It Simple) can be applied to just about anything, including sports analytics. Here I use SAS to create some simple charts to analyze a winning soccer team...

To get you into the right mood, here is a picture of my friend Jennifer's dog, who loves soccer :)


In a previous blog post, I showed several fancy graphs you could use for sports analytics. And with the world cup upon us, I thought it would be interesting to re-use some of those graphs to analyze soccer data. But the more I looked at prior World Cup soccer data, the more I noticed that the simple bar charts actually worked best.

Spain was the winner of the 2010 World Cup, so let's see what was so special about Spain in the tournament games.

First I assumed that the winning team probably scored the most goals. But they were actually in the middle when it came to goals per game:


How about the % of shots on target, or the % of their shots that made goals? Nope! They were also right in the middle in those areas too!



So, where did Spain "stand out" in the graphs? They were #1 in Overall Pass Completion:


They were #1 in Shots Excl Blocked Shots:


They were #2 for having the least number of Yellow Cards:


And they were #3 in having the lowest Average Goals Conceded per Game:


Ok you soccer players & fans - were there any surprises here?!? Can any of you SMEs (subject matter experts) help further explain the data - are these factors important, or just a coincidence? What's your theory on what it takes for a team to win the World Cup, and do you have any data & analytics to help support your theory? :)

Post a Comment

How to read Excel spreadsheets with SAS University Edition

This blog post teaches you how to import an Excel spreadsheet into the free SAS University Edition, so you can further analyze and graph the data.

First, you need to create a folder on your local computer (such as C:\\SASUniversityEdition\\myfolders\\ ), and then then set that up as a Shared Folder so that Oracle VM VirtualBox can see it. SAS Support has set up a help page describing how to do that. Once you've got the shared folder set up, you can access it via the path /folders/myfolders/ in your SAS job (fyi - the emulator is running a flavor of Unix, therefore this is a Unix path).

Now when you place files (such as Excel spreadsheets) into this folder on your local computer, your SAS jobs (running in the Virtualbox emulator) will be able to see them. So let's download an interesting spreadsheet from the Web, and save it into that location...

The price of gasoline (petrol) here in the US has always been of interest to me. Possibly because I used to have a Ford F250 truck with a 6.8 liter V10 engine that got 8mpg in city driving (I now have a Prius! LOL)


Therefore I'm glad the US Energy Information Administration tracks the average price for a gallon of gasoline in the US, and makes it available on their Web site. Use your favorite browser on your local computer, and go to their page and click the Download Data (XLS File) link, located near the top of the page. Save the spreadsheet on your local computer into C:\\SASUniversityEdition\\myfolders\\EMM_EPMR_PTE_NUS_DPGw.xls

Now you can run the following code in your SAS University Edition to import the spreadsheet data into a SAS dataset:

PROC IMPORT DBMS=xls OUT=eia (rename=(a=date b=price)) replace
RANGE="Data 1$A4:B1500";

You can click the dataset in Folders->My Libraries->WORK->EIA and view the table.


But viewing the raw numbers only provides a certain amount of insight. If we could graph the data, then we could much more easily look for trends and such. Luckily you've got SAS University Edition - you can run the following code to create a graph of the data from year 2000 to present. Notice that I'm adding several 'extras' here, to show you the syntax for subsetting the data, adding custom labels to the axes, and formatting the price as US $.

proc sgplot data=eia (where=(date GE '01jan2000'd));
label date='Year' price='US Gasoline Price per Gallon';
series x=date y=price;
yaxis valuesformat=dollar10.2;

I'm going to let you tell me what you see in the graph (in comments)! Can you identify any major events based on abrupt changes in the graph? Do any trends or cycles jump out at you? What are your theories on gasoline prices? :)




Post a Comment

How to create fancy statistical graphs in SAS University Edition

If you're wanting to become a 'data scientist' then you should probably learn SAS/STAT ... and this blog shows you the basics of how to run a statistical analysis in the free SAS University Edition.

In my previous blog posts, you learned how to install SAS University Edition, and how to create some basic graphs in SAS. But in order to become a highly paid data scientist, you need to know how to do more than simply graph the data - you need analytics. And the SAS/STAT product is one of the best tools for performing statistical analyses. In this blog I show you how easy it is to run data through a SAS/STAT procedure, and produce some really impressive graphical visualizations of the results.

First we need some (fake) sample data. In my previous blogs I showed you how to use sample data that was included with SAS. This time I'll show you how to create your own (random) sample data from scratch. In the code below, I loop through and create 1000 lines of data in a data step. Copy-n-paste the following into your CODE window, and run it (click the button with the little icon of a 'running man'):

data fakedata;
 do i = 1 to 1000;
  z1 = rannor(125);
  z2 = rannor(125);
  z3 = rannor(125);
  x = 3*z1+z2;
  y = 3*z1+z3;

Once you have successfully run the code and created the random sample data, now you can use Proc KDE to analyze it and generate some impressive graphics (the KDE procedure performs bivariate kernel density estimation).

proc kde data=fakedata;
 bivar x y / plots = contour contourscatter histogram surface;

And if you've done everything correctly, you'll get the following:





But let me leave you with a stern warning ... Please don't just blindly run the SAS/STAT procedures without understanding what they do. You need to understand the assumptions & requirements for the data, and have a good basic knowledge of what the analysis is doing, for each statistical analysis you perform. Just because a SAS statistical procedure can run against your data without producing any 'ERROR' messages, does not mean that statistical analysis was valid for that particular data.

Post a Comment

How to create a bubble plot in SAS University Edition

Are you a fan of Hans Rosling's famous bubble plots? ... Then why not learn how to create your own bubble plots in SAS University Edition?!? :)

Perhaps you saw my SAS/GRAPH imitation of Hans Rosling's animation in a previous blog (see a snapshot of my graph below)? Or perhaps the SGPLOT version in Sanjay Matange's blog? Or maybe you're just a fan of bubble plots in general? Whatever the case, this blog will show you the basics of creating bubble plots in your free copy of SAS University Edition that you recently downloaded!


First, you'll need to have some data that makes sense to visualize with a bubble plot. You'll typically be representing 3 or 4 values with each bubble. Your X and Y variables will be represented by the position of the marker (like a regular scatter plot), and the size of the marker will represent the value of a 3rd variable. And you'll sometimes want to use a 4th variable to control the color of the bubbles.

Perhaps you already have the 'perfect' data for a bubble chart, but you'll often need to summarize your data first. There are several ways to do that in SAS - I'll show you the SQL way, since many of you are probably already familiar with SQL. Enter the following into the CODE tab of the Program 1 window, to summarize the data from the SASHELP.CARS data set (which ships will SAS). You can type the code by hand, or copy-n-paste it. Then click the Run button (icon of a little man running). Look at the log messages to make sure it ran correctly.

proc sql;
create table car_summary as
select unique origin, make,
 avg(horsepower) as hp,
 avg(mpg_city) as city,
 avg(mpg_highway) as highway
from sashelp.cars
where type='Truck'
group by origin, make;
quit; run;
proc print data=car_summary;

If you entered & ran all the code correctly, the Proc Print should produce the following summarized table:

Now enter & run the following code (in the CODE tab again) to create the bubble plot. The X/Y position of the bubbles will be determined by the Highway and City MPG, the size of the bubbles will represent the Horsepower, and the color will represent the country of Origin. If you're typing the code by hand, make sure to include all the quotes, slashes, and semicolons - they are important!

Title "Truck MPG and Horsepower Comparison";
proc sgplot data=car_summary;
bubble x=highway y=city size=hp /
 group=origin datalabel=make;
 keylegend / location=inside position=bottomright;

And if you did everything just right, you should get a bubble plot that looks a lot like this ... and you're well on your way to becoming a SAS visualization expert! :)


Now that you're a bubble plot expert, what data would you like to use in your own bubble plot? (feel free to add your reply/answer in a comment)

Post a Comment

SAS Programming is going on tour

PT_160x160One of my favorite bands, Kings of Leon, is touring again this year and making a stop in Raleigh.

I didn’t want to take any chances that I might miss them playing in my hometown so I bought tickets as soon as they went on sale.

As you might agree, music is best heard live, but sometimes your only chance to experience that is when the band goes on tour.

The same can be said for training. So that’s why we’re taking three of our most popular courses, SAS Programming 1, 2 and 3 on the road for a five-city tour.

Here are the upcoming cities and dates.

  • Richmond: Programming 1 - Sept. 3-5 and Programming 2 - Oct. 15-17
  • Miami: Programming 1 - Sept. 3-5 and Programming 2 - Oct. 15-17
  • Portland: Programming 1 - Sept. 23-25, Programming 2- Oct. 15-17, Programming 3 - Nov. 4-6
  • Cleveland: Programming 1 - Sept. 3-5, Programming 2 - Oct. 15-17, Programming 3 - Nov. 12-14
  • San Jose: Programming 1 - Sept. 9-11, Programming 2 - Oct. 14-16, Programming 3 - Nov. 18-20

Get your tickets now for one of our stops. We even have best value bundles to help you save on training.

If you’re anything like me, you’ll get registered now – in case it sells out.

Post a Comment

How to create a histogram in SAS University Edition

This is a simple tutorial showing how to use SQL to subset data, and then create a histogram using Proc Sgplot, in SAS University Edition.

So you've downloaded SAS University Edition, and you're wondering "What now?" -- I would recommend exploring some of the sample data, and creating some simple charts!

To explore the sample data, select 'Libraries' along the left side, and then expand the 'SASHELP' library. This will show you a list of all the sample data that is included with the SAS University Edition. Scroll through the list of datasets, and look for names that might interest you.


For this example, we'll be using the SASHELP.HEART dataset, which contains some heart-related data about several patients. Double-click SASHELP.HEART, and it will let you browse the data in a spreadsheet-like interface. Scroll left/right in the data, and notice there is a column for Sex and columns for Diastolic, and Systolic blood pressure (these are the values we'll be using in our graph).


We could easily plot all the data, but it is very useful to know how to plot just a subset. There are several ways to subset data in SAS, but I'm going to teach you how to do it with Proc SQL ... because SQL is a very versatile tool to use, and also because many of you might already be familiar with SQL if you've worked with databases.

The following SAS SQL code will create a new dataset called male_data, containing ... you guessed it! ... just the data for the males. Type this code into the CODE tab of the Program 1 window, and then click the Run button (icon of a little man running). Yeah, I know - I'm a meanie, making you type it in, rather than copy-n-paste -- but this is part of the learning process! :)


Did your code run smoothly? If not, check to make sure you have a matching single-quote on both sides of 'Male', and make sure you have all four semicolons! Once you've got it running smoothly, then you can add the following Proc Sgplot code to create a histogram:


Double- and triple-check to make sure you have all the code typed in correctly, with all the quotes, slashes, and semicolons ... and then click the Run button. If you've done everything correctly, you should get a chart like the following:


So now you know how to view the sample data, manipulate it with SQL, and create a simple chart - you're well on your way to becoming a highly-paid SAS programmer! :)

Post a Comment

Free SAS Software for students!

Remember the episode where Oprah gave a free car to everyone in her studio audience? - Well Jim Goodnight goes one better, and gives free SAS Software to all students in the world!

When I was in graduate school, I felt very fortunate to be at NC State University, because SAS let us use their software for free. I don't know how I could have done my data/graphics intensive research without it. And now SAS is making their software available (for free) for teaching, learning, and research in higher education all over the world, with the SAS University Edition!

SAS University Edition page


Here are the basic steps to install the software (do this once) ...

Download the free Oracle VirtualBox - this is the only thing that's really 'installed' on your computer.

Download the free SAS University Edition (this basically places a pre-installed copy of SAS in the VirtualBox environment).


Here are the basic steps to start up the software (do this each time you want to run a new SAS session) ...

Double-click the Oracle VM VirtualBox icon on your desktop.


You will then get a VirtualBox window, with SAS-University-Edition visible along the left side. Click the 'Start' button (green arrow).


You'll see this window as the SAS server starts in the background ...


And after about a minute (depending on the speed of your computer) you'll see the following VirtualBox window:




And here is how to use SAS - simply enter the following URL in a Web browser on your computer:



If you have run SAS in the past, you have probably used the Display Management System (DMS) as your user interface, which lets you edit and submit code, view your results, etc. The clever SAS developers have recently implemented a new interface called SAS Studio that is very much like DMS, but runs in a web browser. Here's what it looks like:


I plan to write several blog posts describing how to do various useful things in the University Edition, but I want to wrap up this blog with a simple graph, using sample data that is shipped with the software. Type the following into the CODE window, and then click the 'Run' button (picture of the little man running).


And you get the following graph:



If you've got friends that are college students and/or faculty, be the first to tell them about this great news, and you'll be their "SAS Hero" :)

Post a Comment