50 million illegal aliens apprehended in the US

There's been quite a bit of controversy about the number of undocumented immigrants in the US lately - for example, Ann Coulter claims that number is 30 million, whereas others claim it's about 11 million (readers of my blog are data-savvy, and would dig into the details of such claims, of course). It's difficult to get a definitive count of something that is by definition 'undocumented,' therefore I focus on something that is more easily quantifiable - the number of illegal aliens that have actually been apprehended.

I found the data on the US Customs & Border Patrol (CBP) website, in the form of a table, as shown in the partial screen-capture below:

illegal_alien_screen_capture

This is very interesting quantitative data on the topic, but I found it a bit difficult to digest in their tabular form. There are just too many numbers to try to keep track of in my head, as I look for trends and such. Therefore I imported the data into SAS and created a bar chart - this really helped me see how the numbers have changed over time. I also used SAS to calculate the grand total (nearly 50 million) and annotate it onto the graph using a large font. Read More »

Post a Comment

Was the dress blue ... or was it teal, sky, turquoise, or spindrift?

I saw the dress photo as blue & black. If you're a female, even if we perceived the exact same color, you might might not have said 'blue & black'. That's because women have a larger color vocabulary than men, and you might have elaborated on exactly which blue and which black.

blue_dress

This blog is about a fun/unscientific comparison of the color names men and women use. If you do a Google search for 'men women color names' and look at the images, you will get several matches showing various visualizations of a spectrum of colors, showing that women have a different name for each one, whereas men tend to lump them together into groups.

google_colors

Read More »

Post a Comment

How the Tour de France and SAS Factory Miner relate

July has been an exciting month for me. Not only because of the historic Tour de France this year... but even more because this month the new offering SAS Factory Miner was officially released!

With SAS Factory Miner you can run predictive models in an automated model tournament environment to quickly identify the best performer for each segment. Wait a second… tournament, best performer, segment… doesn’t that sound like the Tour de France? Maybe it’s not such a coincidence after all that the launch of SAS Factory is during the world most prestigious cycling race. Let’s investigate how they relate.

Tour_1

Not one, but multiple winners

As organizations begin to apply analytics to growing numbers of customer and business segments, predictive models often must be developed at increasingly granular levels. SAS Factory Miner provides an environment for building, comparing and retraining predictive models at scale across multiple segments. With just a few clicks you can uncover the champion model for each segment. Read More »

Post a Comment

Saint Peter’s University introduces Master’s degree in data science and business analytics

In December, Saint Peter’s University grants Master’s degrees to its inaugural class of data scientists.  36 students are enrolled in this program, and eight are set to graduate.   As reported this year by Bloomberg, career opportunities for analytics talent are excellent.

Saint Peter’s is the latest to collaborate with SAS to offer such a program. We’ve helped launch more than 30 Master’s degrees and 60 certificate programs all over the world in analytics and related disciplines.

In the past year alone, we helped lay the foundation for new Master’s programs at Michigan State University, University of Maryland, University of Missouri, George Washington University, Shiv Nadar University, Indian Institute of Management and University of South Australia.

For us, it’s a no-brainer. We need analytics expertise as much as any company. And that expertise is scarce.  Consider a recent report released by MIT Sloan Management Review. The upshot: technology is no longer the key inhibitor for organizations struggling to get value from analytics.  It’s lack of analytical talent.

What’s it like to build a data science program to bridge the gap? I asked Dr. Sylvain Jaume, Director of Saint Peter’s Data Science graduate program, what it entails. Key early steps, he said, were the school’s vision and strategic investment into the program as well as getting commitment from companies to provide practical experience for students. Read More »

Post a Comment

Only brush the teeth you want to keep

When I was a kid, I remember a motivational poster on my dentist's wall that said "You don't have to brush all your teeth -- only the ones you want to keep."  That poster really made me think, and brush my teeth! And now that I'm a data-analyst adult, I think I've found an even scarier motivational poster ... graphs showing the percentage of senior citizens who have lost all their natural teeth!

Before we get to the scary data though, here's a picture of my friend Becky's daughter, who pulled her first tooth while performing on-stage in the Sword of Peace outdoor drama. Hopefully once all her permanent teeth come in, she'll keep them for a very long time!

lost_tooth

 

Read More »

Post a Comment

Explaining analytics with Jeff Zeanah

Zeanah_Jeff_02One of the most important skills for data scientists and business analytical professionals is communications. If decision makers and managers don't understand what the numbers mean -- results won't turn into action.

Jeff Zeanah, President of Z Solutions, Inc. has been presenting on the topic of speaking “analytics” for many years. Now he’s developed a Business Knowledge Series course on the topic, Explaining Analytics to Decision Makers: Insights to Action.

I had the chance to interview Zeanah about his new course and the state of the analytics industry.

  1. What are some of the biggest advancements in data mining over the past 10 years?

Data and change of focus based on data.  There has been a subtle, meaningful almost unspoken change from modeling populations to modeling individuals or individual events.  So while there is a lot of discussion of “big data”, the reality is the data is being parsed to get to specific details (a subset of the data) that relate to the detailed investigation.  I like to call this reduction in data and dimensionality moving closer to an “Actionable Truth” – details around fact(s) that we can take action on.

As such, I believe even the term Analytics is now more descriptive than data mining as it implies greater use for decisions – and that in itself is an advancement. Read More »

Post a Comment

Jedi SAS Tricks - Maximum Warp with Hadoop

I'm gearing up to teach the next "DS2 Programming Essentials with Hadoop" class, and thinking about Warp Speed DATA Steps with DS2 where I first demonstrated parallel processing using threads in base SAS. But how about DATA step processing at maximum warp? For that, we'll need a massively parallel processing (MPP) platform - like Hadoop.

Hadoop is an amazingly flexible platform for inexpensively storing and processing massive amounts of all types of data. With a well-provisioned Hadoop cluster & SAS, even more processing speed can be achieved. I have access to a small Hadoop cluster with the SAS Embedded Process software components installed and SAS on Windows which included licenses for the SAS/Access Interface to Hadoop and the SAS In-Database Code Accelerator for Hadoop. With this arrangement, it's possible to run DS2 DATA step and thread code directly in Hadoop. If you are reading and writing to Hadoop files, the DS2 code goes in and processes in Hadoop, and nothing comes out but the log! Reducing the need to push data to the compute platform should definitely improve processing speed.

I set out to compare processing data with DS2 threads in base SAS to processing the same data in-database in Hadoop. Here is the code I used for my experiment:

LIBNAME hdp HADOOP SERVER="" 
        DATABASE=JediData USER=SASJedi PASSWORD=WarpFactor9;
/* Create the data */
%let MaxObs=1000000;
data t;
   call streaminit(123456);
   do id=1 to &maxobs;
      ru=ceil(rand('UNIFORM')*10);
      rn=ceil(rand('NORMAL',1000,200));
      output;
   end;
run;
 
/* Load the data into Hadoop */
proc delete data=hdp.t;
run;
proc copy in=work out=hdp;
   select t;
run;
 
proc ds2;
thread hdp.T_thread/overwrite=yes;
   vararray double score[0:100] score0-score100;
   method run();
      dcl int i;
      set hdp.t;
      do i=LBOUND(SCORE) to hbound(score);
         Score[i]= (SQRT(((ru * rn) / (rn + ru))*ID))*(SQRT(((ru * rn) / (rn + ru))*rn));
      end;
   end;
endthread;
run;
quit;

Next, I executed the thread in base SAS:

proc ds2;
/*Threaded Alongside*/
data hdp.T_alongside/overwrite=yes;
   dcl thread hdp.T_thread t();
   method run();
   set from t threads=4;
   end;
enddata;
run;
quit;

This produced the following resource utilization stats in the SAS log:

NOTE: PROCEDURE DS2 used (Total process time):
      real time           1:59.04
      cpu time            1:07.43

Next, I ran the DS2 data program and thread in-database with the DS2ACCEL= option on the PROC DS2 statement:

proc ds2 ds2accel=yes;
/*Threaded In-Database*/
data hdp.T_indb/overwrite=yes;
   dcl thread hdp.T_thread t();
   method run();
   set from t;
   end;
enddata;
run;
quit;

This produced the following resource utilization stats in the SAS log:

NOTE: Running THREAD program in-database
NOTE: Running DATA program in-database
...
NOTE: PROCEDURE DS2 used (Total process time):
      real time           1:09.59
      cpu time            0.15 seconds

I managed to cut the elapsed time almost in half, even with my puny Hadoop test cluster! It makes a real difference when you can take the code to the data, instead of having to bring the data to the code.

I'm not going to post a ZIP flie for this blog entry, because I can't give you my Hadoop environment to play with. But if you'd like take DS2 and Hadoop for a test drive, you can see this and lots of other really amazing SAS & Hadoop technology by checking out the SAS Data Loader for Hadoop trial download. Better yet, join me in Boston for the next "DS2 Programming Essentials with Hadoop" class and we'll take a deep dive together. Or, if you would rather see a great introduction to Hadoop and an overview of all the ways it interacts with SAS, try our "Introduction to SAS and Hadoop" course, and I think you'll agree: SAS and Hadoop - it's a wonderful thing :-)

Until next time, may the SAS be with you!
Mark

Post a Comment

Analysis of serial killings in the US

I recently came across some very interesting data on serial killings ... but it was in tabular/text form. This seemed like an invitation for me to create some graphs that make it easier to understand the data.

It seems many people have a morbid curiosity about serial killers. For example, some of the most popular shows on TV (such as Dexter, and Criminal Minds) focus on them. So when I found this data on serial killings, I thought it would be interesting to 'bring it to life' with a graphical analysis.

Let's start with something simple - the number of victims per year, since 1900:

us_serial_killers_by_year

Read More »

Post a Comment

Bald eagles return to the United States!

Bald eagles, the national bird of the United States, came perilously close to becoming extinct here, but are now making a comeback! Let's look at the data with a SAS map!

When I was growing up in the 1970s & 80s here in North Carolina, I spent a lot of time outdoors but never once saw an eagle. That's because eagles were basically extinct in NC during those years. What happened to the eagles?  The main factor was the widespread use of DDT as an insecticide after WWII ... and one of the side-effects was egg shell thinning, and the eagle eggs broke before they could hatch. DDT was banned in the US in 1972, and the eagles have been making a dramatic comeback since then.

And for some proof of this comeback, here's a picture of a bald eagle that my friend Joe took at Jordan Lake (about 20 miles from the SAS headquarters)...

eagle_joe
Read More »

Post a Comment

Where are different languages spoken?

As you travel around the world, do you know where English, French, Spanish, and Arabic are spoken? This blog will help you quickly answer that question, with some cool SAS maps!

But first, here's a picture of my friend Joy posing beside an interesting sign during one of her international trips. Do you know what languages are used on this sign? Can you translate the message/warning? And for bonus points, if you can understand both languages, do they both say the same thing? :)

sign_language

And now, on to the SAS maps!

I recently saw a map animation on dadaviz.com, that cycled through 4 maps showing which countries English, French, Spanish, and Arabic are 'official languages' in. I liked the way the maps looked, but the animation aspect of it perplexed me - I would have much rather been able to study each map separately, and had hover-text over each country so I could see the country names.

So I set about creating my own version of the maps using SAS software. First I tracked down the lists of countries where each language is spoken (see the Wikipedia pages for English, French, Spanish, and Arabic), and read the lists into a SAS dataset. I then wrote a short macro that lets me pass in a language name and the color to use, and produces a beautiful map similar to Jishai's dadaviz map.

%macro do_map(language,color,outline);
 
ods html anchor="&language";
 
proc sql noprint;
select unique count(*) into :count separated by ' '
from my_data where upcase(language)=upcase("&language");
quit; run;
 
goptions gunit=pct htitle=15 htext=3.6 ftitle="albany amt/bold" 
 ftext="albany amt/bold" ctext=gray77;
 
pattern1 v=s c=&color repeat=500;
 
title1 ls=1.5 color=&color "&language";
title2 "is an official language in these &count countries...";
 
proc gmap data=my_data (where=(upcase(language)=upcase("&language"))) 
 map=my_map all;
id idname;
choro idname / nolegend 
coutline=&outline cdefault=grayf5 
cempty=grayf5 html=my_html
des='' name="&name._&language";
run;
 
%mend;

And then it was a simple matter of calling the macro 4 times - once for each language:

%do_map(ENGLISH,cx1976d2,cx4e9ce6);
%do_map(FRENCH,cxe91e63,cxf672a0);
%do_map(SPANISH,cxff5722,cxf9beac);
%do_map(ARABIC,cx009688,cx39d8c7);

Below are static images of the SAS maps - click them to see the full-size interactive versions with html hover-text that show the country names. Note that each map also has a count of the number of countries speaking that language. I think this is a much easier way to make sense of the data, than the animation.

official_language_by_country_english

official_language_by_country_french

official_language_by_country_spanish

official_language_by_country_arabic

 

And I'll leave you with an ironic picture that my friend Jennifer took, of an Arabic sign :)

sign_no_photo

 

Post a Comment