Marriage and divorce in the US: What do the numbers say?

I've heard lots of people quote statistics about marriage & divorce, but the experts don't always agree on what the data means. So I decided to run the data through a SAS graphical analysis, and see what the numbers say ...

Before we get into the numbers though, let's have a little non-scientific fun. I asked my friends to submit their 'interesting' wedding photos, and I have selected 2 to include in my blog. Based on your keen powers of observation, which wedding do you think ended in divorce, and which do you think the couple is still happily married?!? (Thanks Holly & Patricia, for providing these great photos!)

wedding_holly

wedding_patricia

 

And now, on with the simpler task - analyzing the numeric data!

I spent some time on Google searching for graphs and analyses that were already out there. Read More »

Post a Comment

Creating a better graph to show trade deficit

I recently saw a cool graph showing the US import/export trade deficit. But after studying it a bit, I realized I was perceiving it wrong. Follow along in this blog, to find out what the problem was, and how I redesigned the graph to avoid it.

I was looking through dadaviz.com and happened upon a cool graph showing the US imports & exports over time, and the deficit between the two. I didn't like that they used an animation to alternate between the two graphs, but I did like what the data was showing.

I decided to create my own version of their graph, and make a few little changes to improve it. I found the data on the US Census website, saved it in Excel spreadsheet format, and imported it into SAS. I then used PROC SQL to merge the import and export data into a single dataset, so I could graph them together.

Rather than packaging the two graphs as a gif animation, I created them separately. The biggest change I made was in cleaning up the time axis a bit, and extending some reference lines from it. I also made it clear (in the title) that it is a plot of monthly data (which wasn't 100% clear in the original graph). Here are my two graphs:

us_trade_balance

us_trade_balance1

Read More »

Post a Comment

Map of US domestic cannabis eradication

In my quest for interesting data to graph, I found some Drug Enforcement Administration (DEA) data on US domestic cannabis eradication. Does the data say anything interesting? Read on to find out! ...

While doing some searches for other data, I happened across a table on the DEA website titled 2014 Domestic Cannabis Eradication/Suppression Statistical Report. Here's a screen-capture showing a bit of the report:

cannabis_eradication_table

It seemed like some very interesting data, but I found it very difficult to read the individual values of all the states, and compare them in my head. Therefore I imported the data into SAS, and started exploring it. I tried graphing the data several ways, and here is my favorite visualization (click the image below to see the full-size map, with hover-text): Read More »

Post a Comment

50 million illegal aliens apprehended in the US

There's been quite a bit of controversy about the number of undocumented immigrants in the US lately - for example, Ann Coulter claims that number is 30 million, whereas others claim it's about 11 million (readers of my blog are data-savvy, and would dig into the details of such claims, of course). It's difficult to get a definitive count of something that is by definition 'undocumented,' therefore I focus on something that is more easily quantifiable - the number of illegal aliens that have actually been apprehended.

I found the data on the US Customs & Border Patrol (CBP) website, in the form of a table, as shown in the partial screen-capture below:

illegal_alien_screen_capture

This is very interesting quantitative data on the topic, but I found it a bit difficult to digest in their tabular form. There are just too many numbers to try to keep track of in my head, as I look for trends and such. Therefore I imported the data into SAS and created a bar chart - this really helped me see how the numbers have changed over time. I also used SAS to calculate the grand total (nearly 50 million) and annotate it onto the graph using a large font. Read More »

Post a Comment

Was the dress blue ... or was it teal, sky, turquoise, or spindrift?

I saw the dress photo as blue & black. If you're a female, even if we perceived the exact same color, you might might not have said 'blue & black'. That's because women have a larger color vocabulary than men, and you might have elaborated on exactly which blue and which black.

blue_dress

This blog is about a fun/unscientific comparison of the color names men and women use. If you do a Google search for 'men women color names' and look at the images, you will get several matches showing various visualizations of a spectrum of colors, showing that women have a different name for each one, whereas men tend to lump them together into groups.

google_colors

Read More »

Post a Comment

How the Tour de France and SAS Factory Miner relate

July has been an exciting month for me. Not only because of the historic Tour de France this year... but even more because this month the new offering SAS Factory Miner was officially released!

With SAS Factory Miner you can run predictive models in an automated model tournament environment to quickly identify the best performer for each segment. Wait a second… tournament, best performer, segment… doesn’t that sound like the Tour de France? Maybe it’s not such a coincidence after all that the launch of SAS Factory is during the world most prestigious cycling race. Let’s investigate how they relate.

Tour_1

Not one, but multiple winners

As organizations begin to apply analytics to growing numbers of customer and business segments, predictive models often must be developed at increasingly granular levels. SAS Factory Miner provides an environment for building, comparing and retraining predictive models at scale across multiple segments. With just a few clicks you can uncover the champion model for each segment. Read More »

Post a Comment

Saint Peter’s University introduces Master’s degree in data science and business analytics

In December, Saint Peter’s University grants Master’s degrees to its inaugural class of data scientists.  36 students are enrolled in this program, and eight are set to graduate.   As reported this year by Bloomberg, career opportunities for analytics talent are excellent.

Saint Peter’s is the latest to collaborate with SAS to offer such a program. We’ve helped launch more than 30 Master’s degrees and 60 certificate programs all over the world in analytics and related disciplines.

In the past year alone, we helped lay the foundation for new Master’s programs at Michigan State University, University of Maryland, University of Missouri, George Washington University, Shiv Nadar University, Indian Institute of Management and University of South Australia.

For us, it’s a no-brainer. We need analytics expertise as much as any company. And that expertise is scarce.  Consider a recent report released by MIT Sloan Management Review. The upshot: technology is no longer the key inhibitor for organizations struggling to get value from analytics.  It’s lack of analytical talent.

What’s it like to build a data science program to bridge the gap? I asked Dr. Sylvain Jaume, Director of Saint Peter’s Data Science graduate program, what it entails. Key early steps, he said, were the school’s vision and strategic investment into the program as well as getting commitment from companies to provide practical experience for students. Read More »

Post a Comment

Only brush the teeth you want to keep

When I was a kid, I remember a motivational poster on my dentist's wall that said "You don't have to brush all your teeth -- only the ones you want to keep."  That poster really made me think, and brush my teeth! And now that I'm a data-analyst adult, I think I've found an even scarier motivational poster ... graphs showing the percentage of senior citizens who have lost all their natural teeth!

Before we get to the scary data though, here's a picture of my friend Becky's daughter, who pulled her first tooth while performing on-stage in the Sword of Peace outdoor drama. Hopefully once all her permanent teeth come in, she'll keep them for a very long time!

lost_tooth

 

Read More »

Post a Comment

Explaining analytics with Jeff Zeanah

Zeanah_Jeff_02One of the most important skills for data scientists and business analytical professionals is communications. If decision makers and managers don't understand what the numbers mean -- results won't turn into action.

Jeff Zeanah, President of Z Solutions, Inc. has been presenting on the topic of speaking “analytics” for many years. Now he’s developed a Business Knowledge Series course on the topic, Explaining Analytics to Decision Makers: Insights to Action.

I had the chance to interview Zeanah about his new course and the state of the analytics industry.

  1. What are some of the biggest advancements in data mining over the past 10 years?

Data and change of focus based on data.  There has been a subtle, meaningful almost unspoken change from modeling populations to modeling individuals or individual events.  So while there is a lot of discussion of “big data”, the reality is the data is being parsed to get to specific details (a subset of the data) that relate to the detailed investigation.  I like to call this reduction in data and dimensionality moving closer to an “Actionable Truth” – details around fact(s) that we can take action on.

As such, I believe even the term Analytics is now more descriptive than data mining as it implies greater use for decisions – and that in itself is an advancement. Read More »

Post a Comment

Jedi SAS Tricks - Maximum Warp with Hadoop

I'm gearing up to teach the next "DS2 Programming Essentials with Hadoop" class, and thinking about Warp Speed DATA Steps with DS2 where I first demonstrated parallel processing using threads in base SAS. But how about DATA step processing at maximum warp? For that, we'll need a massively parallel processing (MPP) platform - like Hadoop.

Hadoop is an amazingly flexible platform for inexpensively storing and processing massive amounts of all types of data. With a well-provisioned Hadoop cluster & SAS, even more processing speed can be achieved. I have access to a small Hadoop cluster with the SAS Embedded Process software components installed and SAS on Windows which included licenses for the SAS/Access Interface to Hadoop and the SAS In-Database Code Accelerator for Hadoop. With this arrangement, it's possible to run DS2 DATA step and thread code directly in Hadoop. If you are reading and writing to Hadoop files, the DS2 code goes in and processes in Hadoop, and nothing comes out but the log! Reducing the need to push data to the compute platform should definitely improve processing speed.

I set out to compare processing data with DS2 threads in base SAS to processing the same data in-database in Hadoop. Here is the code I used for my experiment:

LIBNAME hdp HADOOP SERVER="" 
        DATABASE=JediData USER=SASJedi PASSWORD=WarpFactor9;
/* Create the data */
%let MaxObs=1000000;
data t;
   call streaminit(123456);
   do id=1 to &maxobs;
      ru=ceil(rand('UNIFORM')*10);
      rn=ceil(rand('NORMAL',1000,200));
      output;
   end;
run;
 
/* Load the data into Hadoop */
proc delete data=hdp.t;
run;
proc copy in=work out=hdp;
   select t;
run;
 
proc ds2;
thread hdp.T_thread/overwrite=yes;
   vararray double score[0:100] score0-score100;
   method run();
      dcl int i;
      set hdp.t;
      do i=LBOUND(SCORE) to hbound(score);
         Score[i]= (SQRT(((ru * rn) / (rn + ru))*ID))*(SQRT(((ru * rn) / (rn + ru))*rn));
      end;
   end;
endthread;
run;
quit;

Next, I executed the thread in base SAS:

proc ds2;
/*Threaded Alongside*/
data hdp.T_alongside/overwrite=yes;
   dcl thread hdp.T_thread t();
   method run();
   set from t threads=4;
   end;
enddata;
run;
quit;

This produced the following resource utilization stats in the SAS log:

NOTE: PROCEDURE DS2 used (Total process time):
      real time           1:59.04
      cpu time            1:07.43

Next, I ran the DS2 data program and thread in-database with the DS2ACCEL= option on the PROC DS2 statement:

proc ds2 ds2accel=yes;
/*Threaded In-Database*/
data hdp.T_indb/overwrite=yes;
   dcl thread hdp.T_thread t();
   method run();
   set from t;
   end;
enddata;
run;
quit;

This produced the following resource utilization stats in the SAS log:

NOTE: Running THREAD program in-database
NOTE: Running DATA program in-database
...
NOTE: PROCEDURE DS2 used (Total process time):
      real time           1:09.59
      cpu time            0.15 seconds

I managed to cut the elapsed time almost in half, even with my puny Hadoop test cluster! It makes a real difference when you can take the code to the data, instead of having to bring the data to the code.

I'm not going to post a ZIP flie for this blog entry, because I can't give you my Hadoop environment to play with. But if you'd like take DS2 and Hadoop for a test drive, you can see this and lots of other really amazing SAS & Hadoop technology by checking out the SAS Data Loader for Hadoop trial download. Better yet, join me in Boston for the next "DS2 Programming Essentials with Hadoop" class and we'll take a deep dive together. Or, if you would rather see a great introduction to Hadoop and an overview of all the ways it interacts with SAS, try our "Introduction to SAS and Hadoop" course, and I think you'll agree: SAS and Hadoop - it's a wonderful thing :-)

Until next time, may the SAS be with you!
Mark

Post a Comment