SAS Author Spotlight: Goutam Chakraborty

SASBooks_SGF

SAS Books team at SAS Global Forum 2016

As the excitement of SAS Global Forum begins to die down and we dust off our sequins for another year, it’s time to get back behind the desk.

This year at SAS Global Forum we hosted a "Top Tips from Your Favorite SAS Press Authors" lunch where we asked three or four authors to present a top tip or two, talk a little about publishing a book with SAS Press, and then invited questions from the audience. Thank you to all who attended – there was a great turn out! For those who were unable to attend we have put together a website of the tips presented at the lunch and a few more from other authors who were not able to present. Check them out and maybe make your day a little easier!

We’ve also uploaded our next Author Spotlight video featuring Goutam Chakraborty, author of our best-selling book, Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS®. Here we ask him about his experience writing the book, how it changed his life, and one thing not many people know about him in 9th grade! Read More »

Post a Comment

Jedi SAS Tricks - Make This a Button in Base SAS

A recent post, Jedi SAS Tricks: The DATA to DATA Step Macro, engendered a lot of response on Twitter. One of the re-tweets included a call to action - make this a button in Base SAS!

Tweet says "Make this a button in base SAS"
Read More »

Post a Comment

What does PROC REPORT do?

SGF2016_SM_0705At SAS Global Forum in Las Vegas I was asked the question, “What does PROC REPORT do?”  It is a simple question, but I hesitated to answer.  I’m normally so deep inside the nitty gritty details of PROC REPORT that I don’t often think about what it would be like to see and use PROC REPORT for the first time.

What level of understanding do you assume when trying to describe SAS programming to a novice user? Where does one start when trying to explain any procedure in SAS?  The simplest answer I can give to the question, “What does PROC REPORT do?” does not seem to cover everything that it should.  But you have to start somewhere, right?  If I were given a second chance to answer that question I would say: Read More »

Post a Comment

A graph fit for a Prince

I was sad to hear that one of my favorite singers, Prince, died last week. This blog post describes a graph I created to show his legacy.

Prince was at the height of his popularity when I was in high school and college, which were the formative years for much of my musical taste. And if my life had a soundtrack, I would probably have several of his songs on it. Especially the time period when I owned this little red 1975 Corvette ...

little_red_corvette

My friend Patricia is also a big Prince fan. She's a writer who has interviewed many musicians, and has actually met Prince. Well ... at least in the spirit of Charlie Murphy's Prince skit on the Chappelle show. Is this the real Prince - I'll let you be the judge!

prince_and_patricia

While feeling nostalgic, and searching out "all things Prince" on the Internet, I came upon the following interesting graph on the fivethirtyeight website. It was an eye-catching graph, but it was a bit confusing, and didn't quite do Prince justice, imho. At first, I thought the songs inside the Prince-symbol had some special significance ... but after studying it a while I decided they didn't. I also wondered which 40 songs were in the graph (Were all my favorites there? Were there perhaps some songs I hadn't heard before?) The graph didn't really enlighten me...

fivethirtyeight_prince

So I found the Billboard article with the data, imported it into SAS, and set about making my own graph ... a graph fit for a Prince! Read More »

Post a Comment

SAS training leads to certification success

Teaching SAS coding is fun. The best part of teaching is hearing success stories from candidates who earn a SAS certification after taking SAS training.

Suiru Jiang is an MBA candidate who successfully passed her SAS certification exam. She took my five-day SAS programming fast track course at Goodman School of Business at Brock University.

I recently sat down with Suiru to ask about her experience to hopefully provide a perspective to other candidates prepping for the certification exam.Suiru_Jiang

Question 1: Why did you want to get SAS certified? What do you think are the benefits to certification?

Suiru:  SAS is a powerful statistical platform which can access a variety of data. SAS is derived from the English language which makes understanding as well as application of this programming language easy.

Big data is the future trend. Data analysts have bright prospects. The biggest benefits of getting SAS certified is how it opens doors to employment. SAS certification demonstrates that you can learn your job more quickly. Read More »

Post a Comment

Looking for cheaters in the Boston Marathon data

With more and more data available these days, and computers that can analyze that data, it's becoming feasible to look for fraud in events such as the Boston Marathon. So put on your detective hat, and follow along as I show you how to use SAS to be a data sleuth!

But before we get started, I wanted to share a picture of my old college buddy Jenny - she actually ran in the Boston Marathon yesterday. She's quite the runner, and I'm really proud of her (and a bit jealous that she's in so much better shape than I am!)

jenny_boston_marathon

With the Boston Marathon in the news, I couldn't help but look around for some examples showing what people had done with the data. I found a *very* interesting article about detecting people who might have cheated when qualifying for the Boston Marathon. Derek Murphy is one of the data sleuths who is passionately analyzing the data, and one of the metrics he uses is to identify the runners who ran the marathon at least 20 minutes slower than their qualifying time. Here's a graph he created:

boston_original

I think it's a pretty neat graph, and I really like it ... except that the bottom axis is a bit busy/confusing. So, of course, I decided to create my own version!

I did a bit of searching, and found that Bill Mill (llimllib) had set up a Github page with some past Boston Marathon data he had scraped from the Boston Athletic Association website. His data collection didn't contain the latest data (it only went up to 2014), but I decided it would be close enough for my purposes. I downloaded the data, imported it into SAS, and created the following plot. Note that the bib numbers in the Boston Marathon indicate runners’ qualifying times - lower numbers mean lower qualifying times, and faster runners.

I simplified the axes a bit, used transparent circular markers rather than solid dots, and included all the data, rather than limiting it to just the competitive runners (I think the last ~1/4 of the runners are more of the fundraisers, rather than competitive runners?):

boston_marathon_2014_data

Murphy's graph highlighted the runners who ran the marathon 20 minutes slower than their qualifying time -- but the data I was using didn't include the qualifying times, so I had to find a different metric to compare the times against. After a bit of head-scratching, I decided to divide the runners into groups (or packs) of 200, and calculate the average speed of each group. I then plotted that average speed as a red line on the graph (I might could have gotten a smoother line by using a "moving average" but I decided to stick with simple for now).

boston_marathon_2014_line

I then identified all the times that were 20% above (or below) the red average line, and put a red 'x' through those circular blue markers. I also added html hover-text so you can see the name & time for those runners, and if you click on them it will launch a Google search. (You have to first click the image below, to see the interactive graph.)

boston_marathon_2014

Note that just because the markers are red doesn't mean these people necessarily cheated! If they ran the marathon 20% slower than their qualifying time, they might have been dealing with sickness, injury, or lack of sleep. Or if they ran it 20% faster than their qualifying time, perhaps they had improved that much by hard practice. But it does perhaps warrant a little extra scrutiny, just to make sure everything is copacetic.

Now it's your turn - what other kinds of fraud analytics would you like to run against marathon data?  Or what other kinds of data could these marathon analytics be applied to? Feel free to leave your ideas and suggestions in the comments section!

 

Post a Comment

Maximize your conference experience by getting SAS certified

SAS users are always looking for ways to optimize, maximize, and prioritize just about everything.  And that includes the precious commodity of time away from the office, even for users at a premier event like SAS Global Forum.  Sure attendees get to learn and share with the best and brightest minds around and investigate new techniques and tools that can directly improve how they work and how their company can help customers.  To make even better use of their time, dozens of attendees also took advantage of the opportunity to challenge a SAS Certification exam right at the conference site.

At major SAS events such as SAS Global Forum and the Analytics Experience series, the SAS Global Certification program offers multiple SAS exam sessions for attendees, usually at a 50% discount.  Here in Las Vegas, two exam sessions were offered on the day before the forum on Monday, April 18.  More than 80 attendees took a SAS exam while they were here.  As you can imagine, SAS users attending the forum are highly motivated individuals which resulted in a significant number earning a SAS credential.  What a great way for someone to get a jump start on their SAS Global Forum experience.

sascertified

Kriss Harris shares his SAS certification story on camera at SAS Global Forum

Next up?  The Analytics Experience 2016 to be held at the Bellagio hotel in Las Vegas September 12-14.  This time the SAS Global Certification program will be offering three exam sessions – two on Sunday, September 11 and one on Monday morning, September 12.  If you are planning on attending and you have been wanting to take a certain exam, why not maximize your time away from the office and do both?  Maybe you will leave the Analytics Experience 2016 not just smarter, but SAS certified. 

Here's a short video with more information about the certification program, including interviews from test takers at SAS Global Forum 2016. 

 

Post a Comment

How much will that taxi cost me?

Are you afraid that if you take a ride in a taxi, you might get "taken for a ride"? If trying to figure out the reasonable price of a taxi is a bit voodoo/black-box to you, here is a SAS data analysis of over 12 million NYC Yellow Cab rides, that will hopefully get you in the right ballpark!

Before we get started, here's a picture of a taxi I saw on my trip to Cuba last fall - I've never been to NYC, but I imagine taxi rides are a little different there! :)

cuba_taxi

I recently came across an interesting graph posted by reddit user 'badgraphs' that analyzed ~1,000,000 NYC Yellow Cab rides from January 2015. His goal was to estimate the "effective rate for an average yellow cab trip in NYC" ($/mile). Below is a copy of his graph:

nyc_cab_original

I found his graph interesting (mainly because I had no idea that this detailed data from all the NYC Yellow Cabs was available!), and the combination of his graph and his write-up answered many questions about the data. But I wondered if I could create a better graph, that was a little more self-explanatory, and didn't need an accompanying article to help users know what was going on in the graph.

I located the data on the nyc.gov website, downloaded the csv file, and imported it into SAS. There were actually over 12 million rides in the data for January 2015 (whereas the graph above only plots ~1 million rides), and of course I included all 12 million in my graph, since SAS can handle that. I decided to let the data speak for itself rather than using regression lines and such, and I found it useful to color the data by the RateCodeID. The coloring helps explain several of the visual features in the graph.

nyc_yellow_cab_fare_distance

Showing the cab fare -vs- distance was interesting, but I had a more direct question ... how much do people generally pay for a cab ride? Therefore I rounded all cab rides to the nearest dollar, and created a histogram. Looks like the typical ride in a NYC Yellow Cab is around $9 (good to know, eh!?!) Read More »

Post a Comment

Getting down and dirty with lake water quality data

Just like my hero Mike Rowe on the Dirty Jobs TV show, I'm finally diving into an area involving water quality ... and poop! Let's take a graphical look at just how clean (or dirty) the water is, at the lake where my Raleigh Dragon Boat Club practices...

Before we get into the data analysis, here's a picture of our dragon boat team. The boat is about 43 feet long and there are 20 paddlers, a drummer, and a steersperson. Several boats line up at the starting line, and it's a straight-line race to the finish line (generally 300 to 500 meters). The paddlers usually get a little wet from water splashing off of the paddles, and there's always the chance of a boat getting swamped if the water is choppy, so you kinda want that water to be clean. See how I sneaked my way back into the data analysis topic!?! ;)

dragonboat_racers

Our team's boat is housed at Lake Wheeler - a small lake on the south side of Raleigh, with not too much motorboat traffic. Several years ago, the lake got a somewhat bad reputation for being dirty because it was frequently closed due to high levels of bacteria found in poop. Although it had this reputation, I had never personally witnessed it being closed while attending dragon boat practice there, once or twice a week for the past ~3 years. I wondered if maybe the lake's water quality had improved ... and of course a good way to find out would be through some data visualization!

After a bit of web searching, I found a page on the Wake County website that had data tables for 2009-2015. The tables were color-coded to help identify the bad days, but I felt like I could get a much better grasp of the information using a graph. So I copy-n-pasted the data from the pdf documents into text files that SAS could read, and imported the values into a SAS dataset. I plotted the enterococci level for the most recent year, and the levels didn't look too bad ... the readings were only above the EPA limit on one day (July 14). Read More »

Post a Comment

Broadcast your SAS credentials with digital badging

Certification_digitalbadgeIf you happen to be a SAS credential holder, like most candidates, you have invested a lot of time and effort in earning your SAS certification.  Wouldn’t it be great if you could broadcast and manage verifiable proof of your achievement where you want and when you want? That’s where digital badging comes in.

The SAS Global Certification program has partnered with Acclaim, a part of Pearson, to provide SAS users who pass SAS certification exams with a digital version of their SAS credentials.  This digital badge can be used in email signatures, digital resumes, and on social media sites such as LinkedIn, Facebook, and Twitter.  This new functionality is available to all SAS users earning certifications at no cost.

The Acclaim digital badging platform provides:

  • A web-enabled version of your credential that can be shared online
  • Labor market insights that relate your skills to jobs
  • A trusted method for real-time credential verification
  • Complete user control of if/when/where your digital badge is displayed

Read More »

Post a Comment