How trendy is your hipster beard?

The #1 rule of any self-respecting hipster is to not claim to be a hipster. Therefore, can there even be such a thing as a hipster beard, or hipster beard data? I contemplated this perplexing question, as I stroked my pirate beard. Since fashion trends tend to be cyclical, perhaps we can just look at beard trends in the past...

Luckily, I was able to find a wonderful study of facial hair trends from 1842-1972! Dwight Robinson went through all the old issues of The Illustrated London News, and counted how many times men appeared in the photos with various types of facial hair (beards, mustaches, sideburns, sideburns & mustaches, and clean shaven). He also created some great plots of the trends over time for each category of facial hair. But I wondered if there was a good way to visualize all the data in a single plot, and did a few Web searches to see if anyone had already worked on this.

Here is a screen capture of an example shown on the Plotly blog. The way the colored areas are stacked and arranged, it shows how the various facial hair trends phased in and out over time. The graph is a bit small, but you can click the edit chart link and see a full size version (although it takes a while to come up).


I like the layout of their graph, but that doesn't outweigh the other problems (mainly with the hover-text)... The time-axis hover-text is a bit deceptive -- it shows year and month values, but the source data makes no mention of month (therefore I think the month values they show cannot be correct). Also, when I scrutinize the hover-text values for the facial hair categories, they seem to be mislabeled (or outright wrong?) For example, look at the 'Clean Shaven' value of 54.98783% for 1898 -- it should actually be 16.5%. Their hover-text appears to maybe be showing the sum of all the values stacked under it, and also mislabeling the facial hair categories(?) Read More »

Post a Comment

How my SAS Press book was born

This guest blog post comes from Dr. David Dickey, one of our original SAS Press authors. Hope you enjoy!

In the late 1970s, shortly after SAS was founded, I was approached by Herbert Kirk and John  Brocklebank from SAS to put together a course on time series.  This was reasonably successful and shortly thereafter I was asked by Dr. Kirk to coauthor with Dr. Brocklebank a book along the lines of the course, but with some textbook-like explanation.  We did so and thus began our involvement with SAS Publishing. This predated the Books by Users program and we joined just a handful of SAS books like Littell and Freund's SAS® for Linear Models book, which I'd admired as a very readable and informative source.

We got a lot of help from SAS in the writing of the book.  Deborah Blank, in particular, did a great editing job for us. I and some of my colleagues went on to use the book as either a main text or
additional reference for the NCSU Applied Time Series course which has been quite popular over the years. Read More »

Post a Comment

Noah's Ark found ... in South Carolina!

I remember being intrigued by the movie In Search of Noah's Ark, when I was a kid back in the 1970s. They claimed to have definitively found the ark ... but of course, since then several other people claim to have found it in other locations. Therefore I don't feel too bad about my unsubstantiated claim that it was recently sighted in South Carolina - they certainly had rains of 'biblical proportions' this past week, and could have definitely put an ark to use! This blog focuses on visualizing those rains...

First, let me try to describe what happened in South Carolina. Surprisingly, this was not Hurricane Joaquin (Joaquin didn't actually make landfall, but went well out into the Atlantic). It was a combination of weather systems that came together 'just right' to funnel heavy rains over South Carolina ... for several days. It was described as a '1000-year rainfall event' (see the PDS-based depth-duration-frequency curve).

In simpler terms, almost all of South Carolina received over 10 inches of rain in just a few days. Here's a SAS map I created, using data from the National Weather Service Advanced Hydrologic Prediction Services website:



And as if 10 inches wasn't enough, many areas of the state actually received over 20 inches of rain. Think about that for a minute ... that's not a just a river rising 20 inches - that's all the land in a whole area of the state getting 20+ inches of water dumped down on it! Read More »

Post a Comment

What Hurricane Joaquin can teach us about analytics

weatherWhile holed up inside, like many others on the East Coast of the United States, suffering from record-breaking rainfall and watching the path of Hurricane Joaquin, I found a perfect metaphor for handling a problem in explaining analytics.

Many executives bemoan the fact that it seems to take forever for the analytical staff to deliver results.  They ask, “Why can’t I get my answer by Friday?” I respond, “If you want an answer by Friday, then you needed to have made an investment six months ago in what I refer to as foundation systems.”

What are foundation systems?

Foundation systems are everything from, of course, analytical tools and databases, the big budget items, to a great deal of small important details including: query code that correctly links tables so those links don’t have to be researched in a time crunch; code to do the inevitable preprocessing to correctly calculate meaningful indicators like the marketing concept RFM (recency, frequency, monetary); checks for missing and bad data, etc.

The experienced analyst can name several other examples of potentially time consuming efforts that must be handled before the answer can be given on Friday.  The analyst knows that if previous work has been accomplished, reusable code developed, and data cleansing accomplished, then the time to complete the analysis is shorter.  This, however, requires the investment of time -- that is free time for the analyst to investigate the data, understand the data and develop useful concepts.

Needless to say, this response of “foundation systems” is a long answer, and while it may be understood, it is not exactly what I would call a “sticky” concept, one that is quickly understood and remembered. Read More »

Post a Comment

Building a better income inequality graph

I hear a lot of talk about income inequality in the US ("the rich get richer..." and such) - especially as elections approach. I also see a lot of graphs, and they all seem to define their numbers slightly differently. I'm not in a position to improve the way income is defined (that's up to the people working with the raw data), but I think I can make some improvements in how the data is graphed!

For example, I recently saw this graph in a Forbes article.


At first I thought it was interesting and fairly well laid out ... but after studying it, there were several things I realized that I didn't like. The biggest problem was that each color did not represent a quintile. The lower 4 each represented quintiles, but the top quintile had the "Top 1%" split out separate. The article also points out that the top quintile's share of the income had recently risen to over 50% ... but yet there wasn't an axis label at 50% to make this easy to see. It was also difficult to see whether the values were changing over time, with the gradual yearly changes. And the title of the graph didn't make it clear whether this data represented households or families.

So I found the Census page for historical income data, and created my own SAS graph. I chose the H-2 table "Share of Aggregate Income Received by Each Fifth of Households" (as opposed to the F-2 table, shows similar data for Families). I imported the Excel spreadsheet into SAS, and after a little experimentation came up with the following graph: Read More »

Post a Comment

Got Data? Teaching SAS Programming for the Real World

For students to become capable data analysts, they need experience that they can take with them into the real world after graduation.  By far the most critical skill for their toolkit is learning to work with real-life data. Therefore, it is important from a teaching standpoint that instructors provide students with programming assignments that will challenge them and allow them to explore all the nuances of realistic data.  A SAS programming course that combines a focus on manipulating data, a solid foundation using visual and analytic tools, and experience working with realistic data sets will give students the opportunity to learn from situations similar to what they will encounter in the workplace. Of these areas, the most time-consuming task for an instructor is identifying meaningful data sets to use for classroom examples, exam questions, and programming assignments.

The book, Exercises and Projects for The Little SAS® Book, Fifth Edition, uses more than 70 data sets that are any combination of: 1) current and interesting; 2) messy; and 3) extensive.  This new exercise book, which I coauthored with Lora Delwiche and Susan Slaughter, contains multiple-choice and short answer questions, along with programming exercises using the aforementioned data sets.  The chapters in this book are linked to the same chapters in The Little SAS® Book, Fifth Edition.  We made a special effort to include extra variables in many of the data sets for Chapter 8, “Visualizing Your Data” and Chapter 9, “Using Basic Statistical Procedures” so that instructors could append additional questions of their own depending on the content covered in their course.  The following are brief descriptions of a few data sets used in the book. Read More »

Post a Comment

Visualizing the 500 biggest corporations in the US

Every year, Fortune magazine compiles a list of the 500 largest US corporations - called the Fortune 500. Their list was a bit difficult to digest in text-form, so I thought I'd try using some maps & graphs on the data ...

For a map analysis, I thought it would be interesting to see how many companies were located in each state. The only tricky part here is that I didn't like any of the built-in legend binning algorithms for this particular data, so I created my bins, and then used a user defined format to have the legend show the values that corresponded to my custom bins. You can click the snapshot below to see the full size interactive map, with html hover-text showing all the company names in each state (note that some browsers might have a shorter length limit on the number of names they can show in the hover-text ... I find Google Chrome works well).


The map was interesting, but kind of predictable - the states with larger populations probably have more corporations, and therefore more corporations in the Fortune 500 list. Therefore I put on my thinking cap to see if I could come up with a way to visualize the data so that I could see if there were any states with a high (or low) number of Fortune 500 companies, for their population. Read More »

Post a Comment

Don’t ignore the next great analytic competitive advantage

Andy_PulkstenisThis guest post was written by Andy Pulkstenis, Director of Advanced Analytics for State Farm Insurance. He leads a team of advanced analytics professionals providing statistical analysis and predictive modeling support for the enterprise across a variety of business units. His background includes more than a decade of experience improving business strategies with designed multivariate experiments. Pulkstenis will be a presenter at the Analytics 2015 conference in Las Vegas, Oct. 26-27. We hope to see you there. 

In my 20-year applied analytics career, I’ve been fortunate to witness to the evolving landscape of business analytics. One notable shift was when companies finally discovered the power of predictive modeling. Initially a tough sell in a world then-dominated by tradition, experience, & classic MBA methodology, it’s now difficult to imagine any company a serious contender if they don’t include predictive modeling in their analytic arsenal. Today when you examine most market leaders, statistical modeling is as firmly entrenched in the corporate culture as Microsoft Windows, SAS, khakis, and snarky Dilbert cartoons. Predictive analytics finally made it, but its cousin experimental design (i.e. statistical testing, or MVT, or DOE, or A/B testing, or test-and-learn, etc.) remains largely on the outside looking in.

Despite the potential to radically transform currently-held anecdotal beliefs about a business, unlock new or deeper insights into drivers of customer behavior, and truly optimize strategy delivery, the applied analytics community has been very slow to embrace statistical testing in the business world, even in the midst of a growing number of success stories. I can say with confidence that my conference presentations on business experimentation are consistently the best talks on testing at a given event – unfortunately because I’m typically the only speaker there talking about the topic! Read More »

Post a Comment

25 Reasons to Write a Book with SAS Press

As we continue our celebration of 25 years of SAS Press, I thought I’d share 25 reasons why you should write a book with us and become a SAS Press author. It’s not all work; we also have fun through this enriching journey from idea to print! Here’s our top 25 list -- we could think of many more.

By publishing with SAS Press, you can:

  1. Inspire SAS users to learn and improve their careers with your book
  2. Get support from SAS from start to finish, from your proposal through print and promotions
  3. Gain access to free SAS software while you write
  4. Have unparalleled access to SAS users for promotion and sales opportunities
  5. Highlight techniques and new SAS features that you have found especially useful
  6. Take advantage of our professional book development services and software. You focus on the writing and leave the publishing to us.
  7. Receive thorough copyediting by SAS technical editors who know their way around a semi-colon and who have helped many SAS Press books become award-winners
  8. Enjoy our annual Author’s Dinner (it’s always a highlight of the year!)
  9. Have an even more impressive business card—your book!
  10. Unlock opportunities to write and present courses for SAS
  11. Take the next step toward authority, credibility, and expertise in your field
  12. Brag to everyone that “you’ve been published”
  13. Enjoy print and e-book delivery of your book
  14. Receive feedback from experts at SAS and from the SAS user community who will help you shape and polish your content
  15. Work with your dedicated project team—you will have your own developmental editor, production specialist, graphics designer, marketing specialist, and more
  16. Get free SAS Author buttons (as many as you can wear!)
  17. Have a personalized marketing plan from our marketing specialists. We have a direct connection with the largest audience for your book—SAS users!
  18. Establish yourself as a “SAS expert” and thought leader
  19. Get your very own social media strategy (to help your book go viral)
  20. Be a rock star at conferences and events through book signings and meet and greets. We provide the opportunities. You just need to be ready for the paparazzi and your groupies.
  21. Enjoy world-wide sales and distribution of your book. Our comprehensive distribution plan ensures that your readers can get your book when they want it and in the format that they prefer, whether it’s print or eBook.
  22. Book speaking opportunities to present at conferences around the world through the SAS Press Speakers Bureau.
  23. Be part of an exclusive SAS Author community. You can truly be one in a million!
  24. Become a SAS mentor. Your book can inspire and encourage SAS users everywhere.
  25. Receive gratitude and appreciation from SAS, your project team, and your readers. Publishing a book with you is a wonderful opportunity to get to know you, to learn more about your experience, and to help your book be the best it can be.

Read More »

Post a Comment

I'm a data guy, not a basketball fan!

I get several requests and recommendations for analyzing sports data. I'm not a big sports fan ... but when did I ever let that stop me! When I find interesting data, I like to graph it!

Before we get into the nitty-gritty data analysis, here is a picture of my friend Jennifer's daughter playing basketball. Perhaps a future NBA star?!? Aww!!!...


I recently found an interesting article on that analyzed all the NBA (basketball) games going back for several decades. They calculated each team's Elo rating after each game, and then plotted the scores as a time series. Below is an example of their graph for the Warriors: Read More »

Post a Comment