Where do you e-learn SAS?

We’re celebrating the student in all of us and you’re invited.

Tweet us your best shot using #SASworldclass!

Our planet-friendly training is available worldwide, wherever you may be. Show us your interpretations of “the world is your classroom.” Riding a camel? Show us. Surfing – don’t forget your waterproofing! The photos we receive will be displayed in an album on our SAS Users Facebook page.

Want to add a little authenticity to your photo shoot?  Take advantage of our FREE e-Courses and downloadable software while you’re at it.

#SASworldclass – Be sure and tell us where in the world you are.

Oh, we ask that no harm comes to any data participating in your photo shoots!

For some inspiration, check out these photos of some of our SAS employees getting involved in the #SASworldclass challenge.

The Stat Wars Guys  - Rivals at Chess…comrades when it comes to the STAT 1 Free e-Course.  For now…

IMG_1567

Our employees don't just exercise the body, but the mind too!

IMG_4993

LinwoodBalanceBall01

And that's our SAS blogger and social guru, @maggiemiller0. The pose is Warrior, but the free Programming1 e-course is OHMMMMM.

FullSizeRender

 

Post a Comment

More free SAS on Amazon!

Now that SAS' clever R&D developers have created SAS Studio (a DMS-like interface to SAS that runs in a web browser) "the sky's the limit" for deploying and accessing SAS software easily.

Last year, we made available a SAS image you could download and run in a virtual environment on your computer and then access through your web browser. And this year there's a new/simpler way where you don't even have to download and run SAS ... it runs on Amazon's servers, and all you need is your web browser!

To try out this new free version of  SAS University Edition (free for students, teachers, learners, and academic researchers), follow the directions in this Quick Start guide (note that you will need a credit card to sign up for the Amazon Web Services account). Once you've completed the first 3 steps, you can always start at Step 4 when you want to run it in the future. One small difference from the locally-installed version is that you have to login to SAS (using the 'sasdemo' account, and the secret password that Amazon will give you).

Here are a few screen-captures so you will recognize if you're on the right track!

aws1

aws4

aws2

After you get everything set up, the SAS Studio web interface will look just like it did with the locally-installed virtual image.

aws3

 

Once you've got your SAS Studio window, I invite you to try to copy/paste some of the SAS/STAT examples. For example, under the TTest Procedure section, the first example is "One Sample" and the code and output look like this in SAS Studio...

aws5

aws6

 

Post a Comment

A closer look at the U.S. income inequality graphs

I've seen a lot of recent news articles purporting income inequality in the U.S. ("the rich get richer, and the poor get poorer") ... and I wondered if the graphs were a true/unbiased representation of the data.

For example, I recently saw a couple of graphs in an article on the NPR website, and decided to track down the data and create my own version of the graphs. In doing so, I hoped to gain more insight into whether or not NPR's graphs represented the data fairly.

Thankfully the article had a link to the data source, and I was able to select & download the same data into an Excel spreadsheet and import into SAS.

After a bit of experimentation, I came up with the following SAS imitation of the NPR graph. I made a few changes in the axes (used a 4-digit year instead of 2-digit, and showed the negative value at the bottom of the y-axis), and I added some footnotes to help explain the graph, but otherwise it is very similar to their graph.

us_income_growth
Read More »

Post a Comment

Jedi SAS Tricks: Warp Speed DATA Steps with DS2

I remember the first time I was faced with the challenge of parallelizing a DATA step process. It was 2001 and SAS V8.1 was shiny and new. We were processing very large data sets, and the computations performed on each record were quite complex. The processing was crawling along on impulse power and I felt the need - the need for warp speed!

From the SAS log we could see that elapsed time was almost exactly equal to CPU time, so we surmised that the process was CPU bound. So with SAS/CONNECT licensed on our well-provisioned UNIX SAS server and an amazing SUGI paper extolling the virtues of parallel processing with MPCONNECT in hand, we set out chart a course in this brave, new world. The concept behind MPCONNECT is to write a SAS control program that breaks your data up into smaller pieces, spawns several identical DATA step jobs to process the pieces in parallel, monitors progress until they all finish, then reassembles the individual outputs to obtain the final results. Labor intensive, for sure, but it definitely accelerated processing of CPU-bound jobs.

But now I have SAS9.4 with the new DS2 programming language. This was built from the ground up with threading in mind - and suddenly parallel processing with the DATA step just became a whole lot easier! For example, here is a (senseless, I’ll admit) CPU intensive base SAS DATA step program:

data t1;
   array score[0:100];
   set t END=LAST;
   do i=LBOUND(SCORE) to hbound(score);
      Score[i]= (SQRT(((id * ru * rn) / (id + rn + ru))*ID))*
                (SQRT(((id * ru * rn) / (id + rn + ru))*ID));
   end;
   count+1;
   if last then put 'Data step processed ' count 'observations.';
   drop i count;
run;

When executed, this process consumes about the same amount of CPU time as elapsed time:

NOTE: DATA statement used (Total process time):
      real time           5.20 seconds
      cpu time            5.11 seconds

I suspect the process is CPU bound and could benefit from threading. First, I’ll try this as a straight DS2 DATA step:

proc ds2;
data t2/overwrite=yes;
   dcl bigint count;
   drop count;
   vararray double score[0:100] score0-score100;
   method run();
      dcl int i;
      set t;
      do i=LBOUND(SCORE) to hbound(score);
         Score[i]= (SQRT(((id * ru * rn) / (id + rn + ru))*ID))*
                   (SQRT(((id * ru * rn) / (id + rn + ru))*ID));
      end;
      count+1;
   end;
   method term();
      put 'DS2 Data step processed' count 'observations.';
   end;
enddata;
run;
quit;

This process is still running single-threaded, and uses about the same resources and elapsed time as the original, with a little extra (as expected) for the PROC overhead:

NOTE: PROCEDURE DS2 used (Total process time):
      real time           5.98 seconds
      cpu time            5.86 seconds

Now, let’s convert the process to a thread. First we create the THREAD program, which will be stored in a SAS library. I’m going to store it in WORK in this case. To convert the DS2 DATA step to a THREAD step, I'll simply change the DATA statement to a THREAD statement and the ENDDATA statement to ENDTHREAD:

proc ds2;
thread th2/overwrite=yes;
   dcl bigint count;
   drop count;
   vararray double score[0:100] score0-score100;
   method run();
      dcl int i;
      set t;
      do i=LBOUND(SCORE) to hbound(score);
         Score[i]= (SQRT(((id * ru * rn) / (id + rn + ru))*ID))*
                   (SQRT(((id * ru * rn) / (id + rn + ru))*ID));
      end;
      count+1;
   end;
   method term();
      /*Make each thread report how many obs processed*/
      put 'Thread' _threadid_ ' processed' count 'observations.';
   end;
endthread;
run;
quit;

Executing that program creates the thread and stores it in the WORK library in a dataset named th2. Now to write a short DATA step program to execute 4 of the threads in parallel:

proc ds2;
/*Multi-threaded*/
data th4/overwrite=yes;
   dcl thread th2 t;
   method run();
   set from t threads=4;
   end;
enddata;
run;
quit;

And the clock time is significantly reduced, at the expense of extra CPU time. Note that the CPU time is longer than the elapsed time indicating operations were conducted in parallel. The routine in the thread’s TERM method reports how many observations each thread processed.

Thread 3  processed 281152 observations.
Thread 2  processed 219648 observations.
Thread 1  processed 294528 observations.
Thread 0  processed 204672 observations.
NOTE: PROCEDURE DS2 used (Total process time):
      real time           3.20 seconds
      cpu time            9.20 seconds

Our threaded process cut the elapsed time almost in half!

That’s all I have for this time. As usual, you can download a ZIP file containing a copy of this blog entry and the code use to create it from this link.

Now I’m off to participate in SAS Global Forum 2015 in Dallas. There are tons of presentations that talk about DS2, SAS in-database processing and using SAS with Hadoop. Look me up! I can be found at the #SASGF15 #TweetUp Saturday night, attending various presentations (especially about DS2 and Hadoop), or hanging out in the Quad on Tuesday afternoon from 2 to 2:30 pm to answer you questions about SAS Foundation programming or DS2. I'm also teaching the post-conference DS2 Programming Essentials class at the conference center. So, I hope to see you there.

Until next time, may the SAS be with you!
Mark

Post a Comment

Technical experts on hand at SAS Global Forum

The SAS Training and Certification groups are excited to participate in SAS Global Forum 2015! We’ll have a booth in the Quad where you can stop by to ask questions, talk to your favorite instructor and register to win an iPad! We offer courses on almost every SAS product so to make thing easier on you, we've put together a schedule of when experts are available in each topic area.

SGFTraining

Do you have a question about certification? SAS Global Certification manager, Terry Barham will be giving an overview of the SAS Certification program on Monday, April 27 at 12:30 p.m. He’ll also be in the certification booth in the Quad during the conference. Our certification program was recently recognized by Certification Magazine as being the “sweet spot” for certification in big data.

There is also an eLearning booth where you can sit down and experience SAS eCourses for yourself. Nine eCourses will be available on laptops and iPads. Topics include SAS Enterprise Guide, Programming, SAS Macro Language, Predictive Modeling, JMP Software, Credit Risk Modeling and SAS Certification practice exams.

If you’re not attending SAS Global Forum, we’re always available to answer your questions about SAS training and certification. Contact us at training@sas.com.

Post a Comment

More reasons to stop smoking!

Smoking is an addictive habit that can kill you - if you don't believe me, check out the infographic in this blog post.

Recently a friend of mine was on the episode of the Dr. Phil show that focused on "quitting smoking." Here's a picture of Traci with Dr. Phil ...

traci_with_dr_phil

Being a non-smoker myself, and seeing very little smoking among my co-workers (smoking isn't allowed on the SAS campus), I hadn't really given much thought to the dangers of smoking. But when my friend mentioned that she was quitting, I did a few web searches on the topic and the statistics are indeed quite scary. I found an infographic on the Centers for Disease Control and Prevention's website, and decided to try to reproduce it with SAS software.

I used the same technique that I demonstrated in the art & analytics blog a few weeks ago, and created the custom donut pie chart using annotate functions, and then annotated colored polygons (using a slightly lighter shade of the pie slice colors) extending out to the side edges of the graph area. I then annotated the text & numbers on the graph. If you click graph below, you can see the interactive version with hover-text and drilldowns on the donut pie slices.

smoking_deaths

Best of luck to Traci, and anyone else out there trying to stop smoking!

Post a Comment

5 questions with analytics expert Bart Baesens

baesens

Bart Baesens

If anyone knows how to finesse insight out of data, it’s Bart Baesens, professor at KU Leuven (Belgium), and a lecturer at the University of Southampton (United Kingdom).

Not only has he written a book about it, Analytics in a Big Data World: The Essential Guide to Data Science and Its Applications, but he also teaches a number of Business Knowledge Series courses, including Advanced Analytics in a Big Data World.

And in his spare time (because I don’t think he requires sleep) he tutors, advises and provides consulting support to international firms with respect to their analytics and credit risk management strategy.

Despite his busy schedule, he’s always available to answer my questions about the latest in analytics. So here they are – I kept it to just five for him.

  1. What is your advice for organizations trying to implement the latest trends such as mass customization, personalization, Web 2.0, one-to-one marketing, risk management, and fraud detection?

In a nutshell, it would be: invest in data and analytics!  The applications you mention all require data, typically collected across a diversity of channels (e.g. on-line, off-line, mobile, web, email, etc.).  The data collected sheds a unique and comprehensive perspective about a customer’s behavior and/or engagement.  By using analytics, organizations can get a clear picture about this, which will allow them to gain competitive leverage and explore new strategic opportunities.  Obviously, it is hereby of key importance that the data is of good quality. That’s why firms are more and more investing in data governance initiatives.

  1. What’s the biggest mistake organizations are making when trying to implement big data strategies? And how can they fix it?

Well, actually, there a few if you ask me.  First of all, big data and analytics should be embedded into a firm’s DNA.  In other words, it should be supported by all decision levels in the company, from operational to tactical and strategical.  That’s why it’s of crucial importance to set up the necessary corporate governance initiatives in terms of organizational impact, logistics and support (both hardware and software), and of course: education and training!  Furthermore, big data & analytics is not magic so make sure to appropriately level set your expectations at the outset of the project.  Finally, there are still business settings where data is only available in small quantities.  Just think about new or very specific products for example.  In those settings, it is important to optimally combine the (often tacit) business knowledge with the limited data available using specialized (e.g. Bayesian network) techniques.

  1. How do you see emerging data science techniques changing business processes in the future?

I think there will be multiple effects.  First of all, thanks to data science, the performance and efficiency of business processes will improve. This will result into cost savings and/or value creation.  A next effect will be regulatory compliance.  Given the impact of analytics, which is now bigger than ever before, we see more and more regulatory guidelines being introduced to develop analytical models.  Just think about the Basel and Solvency accords in risk management for example.  Another popular example concerns privacy regulation.  Data science techniques will allow to ensure that business processes are regulatory compliant.  Last but not least, data science can provide better transparency into business processes by providing new insights into customer behavior.  Think about fraud detection for example, where data science can uncover new fraud mechanisms which can then in turn be used to develop better fraud prevention business processes.

  1. You recently developed a course, Advanced Analytics in a Big Data World. What real-world skills can students pick up in the class?

In a first lesson, I start by zooming into the analytical process model and discuss the key characteristics of an analytical model: accuracy, interpretability, operational efficiency, economical cost, and regulatory compliance.  This is followed by a discussion of how state of the art analytical techniques can be used to develop analytical models satisfying these characteristics in settings such as credit risk modeling, fraud detection, churn prediction, customer segmentation, customer lifetime value modeling, etc.  Techniques discussed are: decision trees and ensemble methods (bagging, boosting and random forests), neural networks, support vector machines, Bayesian networks, survival analysis and social networks.  The course concludes by discussing how to monitor and backtest analytical models.  It includes lots of real-life examples and case studies across diverse settings.  I also extensively report on my recent research findings and industry consulting experience.

  1. Any concluding advice you have for aspiring data scientists?

Yes, sure!  The world is changing at a faster pace than ever before.  Just think about the Internet of Things, drones, self-driving cars, etc.  I believe we are only at the start of the data avalanche.  To spearhead the competition, it is of key importance to continuously educate yourself, understand new technologies and see how they can create added business value.  Knowledge is power, remember! I hope my new E-learning course can contribute to shape the next generation of data scientists!

My last interview with Bart Baesens at the Analytics 2014 conference in Frankfurt.

Here's a photo from another interview I conducted with Bart Baesens at the Analytics 2014 conference in Frankfurt.

Post a Comment

Analyzing wait times at VA health care facilities

Data about the monthly wait times at VA facilities in the US are now available, but it's a bit overwhelming to try to analyze them in tabular form - plotting the data on a map made it a lot easier!...

Here in the US, when our soldiers finish their commitment in the military (retire, or are honorably discharged), they are allowed to utilize the VA health care facilities. But the VA facilities have been under a lot of scrutiny lately - in particular for long wait times.

A recent article in our local news mentioned that the worst VA wait times are in the South. The article mentioned several specific examples, but being a data person, I wanted to see the actual data. I looked around a bit and found the actual data for February 2015. Here's a screen-capture of a portion of the table:

va_table_cap

Unfortunately the data are in a table in a pdf file, which makes it quite cumbersome to work with. I ended up copying and pasting it one line at a time into a simple text file I could import into SAS. I got all major rows for each group of facilities (rather than trying to get each individual facility). I then used Proc Geocode to estimate a lat/long for each facility, and annotated them as markers on a map, color-coded based on the number of appointments completed in under 30 days. (Click the map below to see the interactive version, with html hover-text for each marker.)

va_hospital_wait_times_feb_2015

At this level of aggregation, it does appear that the South might be doing a bit worse than the Northeast, and my state (North Carolina) has some red, orange, and yellow markers (which will hopefully be improving). But rather than trying to compare all the facilities across the nation, I liked that the map allowed me to see where the facilities are located, and hover over them to see their data.

My next step would be to plot all the individual facilities (instead of the aggregate data) - and it would be *great* to find a more convenient version of the data (maybe a spreadsheet or csv file). If anybody knows of a better data source, let me know (hint, hint!)

And to close this blog post, here's a picture of my friend Trena's husband, proudly serving his country - hopefully by the time he's out of the military, we'll have all the facilities running like well-oiled machines, with short wait times and good service!

soldier

Post a Comment

A custom map to help track the flu

Has this year's flu been better or worse than you thought it would be?

There are a lot of factors that help determine whether or not you're likely to get the flu. Is there a bad strain going around? Did the flu vaccine target the right strain? Did you get the flu shot? Has the weather been cold & wet? How has your health been poor in general? Have you had to care for family members who had the flu? Etc, etc, etc.

And I guess a lot of flu-factors get rolled into geography - if the flu is "going around" in your area, then you're probably more likely to get it. Which is why I was happy to find the CDC's flu map! It shows all the US states (and a few other areas) color-coded by the prevalence of the flu! Here's a screen-capture of their flu map:

cdc_influenza_orig

Of course, any time I see a nice map, I naturally want to try to create it in SAS. The CDC map only had 2 challenging aspects that I didn't know the exact code for, right off the top of my head. The first was the cross-hatch patterns - I knew SAS/Graph could do them, but I didn't know the exact syntax. After a quick visit to the pattern statement help page, I determined that the 2 special map patterns could be coded as m4x45 and m4n90. The second challenge was including the territories (such as Guam, US Virgin Islands, and Puerto Rico) in the US map. I decided to subset them out of the world map, re-size & re-scale the x/y coordinates, and then combine them with the US map. Here's a link if you'd like to see the exact SAS code that was used.

The results came out looking very close to the original (see below). And one extra bonus feature of my map is that I added html hover-text for each state - this can be helpful to anyone who is analyzing the data, but in particular allows vision-impaired people to explore the map using Voice-over technology (as the hover over each state, the state name and flu prevalence is read out loud). Click the map snapshot below, to see the interactive version with hover-text.
Read More »

Post a Comment

Landing a SAS Certification

Lauren Guevara

Lauren Guevara

After working as a flight attendant for more than 20 years, Lauren Guevara was ready for a new adventure.

The inspiration for her journey came from an article she read in CNN’s Money magazine that highlighted the earning potential of a SAS Certification. Also having earned a Master of Science in e-commerce years earlier, she naturally gravitated toward the computer industry.

“My mom was the one who encouraged me to read Money magazine,” said Guevara. “The article mentioned career advancements you can make by becoming a data miner and getting certified in SAS.”

After reading the article Guevara started researching SAS online and also purchased the book, Learning SAS by Example: A Programmer’s Guide. That book started traveling the world with her. She devoted her downtime during layovers and breaks to reading. What she learned led her to a unique decision: become a SAS programmer.

Her first step was signing up for online e-learning courses in SAS Programming 1 and Programming 2. “I worked through both e-lessons and tried to learn everything before setting foot in a classroom,” said Guevara.

Eventually she felt ready for the classroom and attended SAS Programming 1 in the Charlotte, NC training center. The classroom training reinforced what she was introduced to in the e-courses and gave her an opportunity to ask more detailed questions. “Coming into this with no experience, classroom and e-learning together was the best way for me to learn it,” said Guevara. “I did a lot of fine tuning in the classroom.”

Guevara wanted to earn the SAS Certified Base Programmer Credential as a way to boost her credibility to potential employers.

“I noticed in the classroom that everyone had computer jobs or worked in the industry,” said Guevara. “Since I didn’t have that same experience, I felt it was necessary to have the credentials to back up my skills. SAS certifications are respected in the industry.”

Guevara purchased the base programming certification package offered by SAS, which included a training course, prep exam and certification exam voucher at a discounted price to help her prepare.

Another study tip she shared was reading the SAS Certification Prep Guide: Base Programming for SAS 9.

Guevara was a bit embarrassed to share that she didn’t pass the exam on her first attempt. However, she realized that it might be inspiring for others to know that it’s possible to fail and still achieve your goals. “The first time I took the exam, I wasn’t ready,” said Guevara, “but I wasn’t giving up. I went back and really started to understand the language better. You really have to know this stuff. It’s hard, but it’s possible.”

With her relentless determination, Guevara passed the base programmer exam and is working to earn the SAS Certified Advanced Programmer Credential by the end of the year. In the meantime, she’s going to attend the annual PharmaSUG event in Orlando to network with other SAS programmers.

Guevara eventually sees herself doing part-time contract work as a programmer, while still flying part time for the airline.

Who knew some simple motherly advice would lead Guevara on this life-changing path? Mom, of course! And she couldn’t be prouder of her daughter. “She sang a song when I finally passed the exam. She’s so happy.”

Learn more and start your SAS Certification journey.

Post a Comment