Where are your Girl Scout cookies made?

I just found out that Girl Scout cookies haven't changed from when I was a kid. I just moved to a different area, serviced by a different cookie maker!

That's what I found out from the cool map in an article on latimes.com! The article explains that there are two different makers of Girl Scout cookies (ABC Bakers, and Little Brownie Bakers), and that each of them makes slightly different versions of the cookies. Check out their article to see side-by-side pictures and descriptions. With the information from that article, can you tell which area my friend Gene-Paul lives in, just by looking at his stash of Girl Scout cookies?

cookies_gene_paul

Being a map-guy, I was especially interested in their map. It showed state outlines, but the shading/coloring was based on the geography of the Girl Scout councils. Their geographical groupings might make sense for people who are associated with the Girl Scouts, but probably not-so-much for the rest of us. So I decided to re-create their map with SAS, using the more universally familiar state and county outlines, and using the colors of Girl Scout and Brownie uniforms. (It was a little difficult because there are a few areas where a county is split between councils, but I did the best I could, and it came out pretty close.) Click my map below to see the interactive version, with html hover-text to see the county names.

girl_scout_cookie_map

Technical details:

I used SAS/Graph Proc GMap to plot the data on maps.uscounty, and used annotate to draw the state outlines in white around each state. ODS Html created the html overlay, which contains the data-driven hover-text with the county names. It took about 4 hours to create the cookie data, and then 30 minutes to create the map. Here's the SAS code if you'd like to see exactly how it was done.

And now I'd like to finish this blog post with a picture of my friend Jennifer's daughter in her cookie-eating outfit -- Aww!!!  :-)

cookies_jennifer

What's your favorite Girl Scout cookie? And which cookie maker do you prefer?

Post a Comment

6 questions with social network analysis expert, Carlos Andre Reis Pinheiro

pinheiro_car

Carlos Andre Reis Pinheiro

Beyond traditional clustering and predictive models lies social network analysis. It can help describe customers’ behaviors in new ways, but what exactly is it and how can businesses use it?

To find out more, I interviewed Carlos Andre Reis Pinheiro. He’s been working in social network analysis around the world for many years looking at distinct types of problems in a variety of industries. He recently developed a course, Social Network Analysis for Business Applications to share his knowledge and help businesses improve performance.

What is social network analysis?

Social network analysis is a set of tools and algorithms based on graph theory to reveal relationships between entities. Everything is linked somehow, and the network analysis approach helps us in understanding the correlations behind different problems and scenarios. The term social is employed once the network analysis is performed under datasets comprising people. How subscribers communicate to each other? How social media users are connected to each other? How authors and co-authors are related to each other? However we can apply network analysis to understand how bank accounts are linked, or countries and governments are connected, or claims are associated, or tax payers are related over time.

What types of businesses can benefit from social network analysis?

Theoretically, any type of business and industry. We can apply social network analysis to reduce churn or to increase sales in telecommunications. The viral effects when customers decide to leave or to buy are identified and used in a way to diminish the first and improve the second. We can also deploy network analysis to detect abuse in insurance and utilities, reveal fraud in banking, or suspicious activities in tax payers. We can use this approach in even sociology, biology, and even medicine.

How do you collect information to perform social network analysis?

Very often the transactional data is the main source for the social network analysis. In telecommunications, calls and text and multimedia messages are one important source to build networks. Even though, for optimization purposes, switches, usage and human motion may be used to create a graph of the physical network. In banking, transactions between bank accounts (or between banks and countries) can be used to build the network and therefore to compute the metrics that explain the relationships comprised on it. In insurance, all claims are considered in connecting the business events to all types of entities (suppliers, policy holders, drivers, witness, repairs, etc.). In utilities, consuming transactions can be used to build the network that connected supply and demand. Emails connecting people, likes, tweets, messages, all kind of relationship between two distinct entities may be considered to build the network and analyze it afterwards. Basically, there is no limit for business applications by using social network analysis.

What kind of insights can you gather by performing social network analysis?

When thinking about complex problems, I believe there is always some network behind them. All kinds of problems can in some way be explained upon the concept of the network science. Most of the business problems are handled by looking at transactions and individual attributes. Social network analysis can distinguish between relationships.

In telecommunications for instance, most of the analytical models take into account subscribers attributes to better understand the market and to predict business events such as churn, bad debt or purchase. Network analysis can understand the subscribers in the most important way, based on their relationships. Telecommunications companies provide a way for people to get connected. Social network analysis is therefore the proper approach to understand how they get connected over time and then use this knowledge to improve business performance. Same happens in banking, insurance, utilities and retail. Thinking in insurance for example, a person may occur in one claim as a policy holder and in another claim as a witness. Or a particular physician occurs in many suspicious claims. The same repair takes place in many exaggerated claims. Everything is linked. We have to understand how these connections affect our business.

You created a new Business Knowledge Series course, Social Network Analysis for Business Applications. Why did you create the course and who can benefit from taking it?

I truly believe that everything is linked somehow. And understanding all these connections, visualizing the network behind complex problems, it is a good approach to better understand the business scenarios and then to provide robust solutions. I have been working in social network analysis for many years, in many countries, looking at distinct types of problems, and in different industries. A course like this is a great opportunity to share the knowledge. Not just the instructor’s knowledge, but mostly the students’ knowledge. Each person comes into the course with a particular experience, in different business, industries, scenarios and purposes. This exchange during the course is unbelievable. We can learn a lot from each other, like a network. We can realize that problems in different industries sometimes are quite similar and solutions may be just adjusted for distinct scenarios. We evolve as a network by interchanging our knowledge during and after the course.

What’s your take on the future of social network analysis?

Social network analysis is part of a discipline which is growing solid and fast. Network science comprises methods from graph theory, mathematics, statistics, physics, data mining and information visualization. It might be used in many business problems, no matter the industry. Based on network science we can better understand social phenomena, political and economic partnerships, diplomacy, business problems, human mobility, international trade market, biological systems, spread and pandemic diseases, internet, communication and collaborative networks, among many others. The use of network analysis to solve business problems is straightforward and may reveal more than we can expect. Intrinsic knowledge behind social relations might disclosure the proper information to better understand a very specific problem and solve it onward.

Post a Comment

Making a 'fun' zip code map more useful

When I saw Robert Kosoro's cool ZIPScribble map, I knew I had to create a SAS version - and of course I had to add a few enhancements along the way....

I was perusing some of the examples on dadaviz.com, and Kosoro's ZIPScribble map caught my attention. It wasn't a particularly useful map, but it drew me in. Here's a screen-capture of the map:

zip_scribble

I knew that I could easily create a similar map with SAS, using the zip code centroids from sashelp.zipcode, and plotting the latitude/longitude centroid values with Proc GPlot. But, even for a silly/fun example, I like to see if I can add a bit of analytic flair.

Therefore, rather than plotting the raw latitude/longitude centroids, I used Proc GProject to convert the coordinates into a more aesthetic map shape that we're accustomed to seeing (rather than all-flat along the northern border). And rather than making all the lines black, I let SAS randomly assign a color (based on the ODS style) by state, so you can more easily see the state groupings. And if you click the map below to look at the interactive version, you can see the html hover-text I added for each 'clump' of lines based on 3-digit zip codes (well, there's a bit of overlap, so this isn't perfect ... but it seems useful).

zip_connect

Post a Comment

Is North Carolina the new Hollywood!?!

You might be surprised at how many movies and TV shows are made in North Carolina - especially within the last few years. This blog provides a SAS graph that will make the list of films even easier to read!

A recent story by the Tar Heel Traveler, and an exhibit at the NC Museum of History, piqued my interest about movies made in North Carolina. Some of the famous movies filmed in NC includes: Iron Man 3 (part of which was filmed at the SAS headquarters!), The Hunger Games, Sleepy Hollow, Forrest Gump, Bull Durham, Dirty Dancing, and Firestarter (just to name a few).

Here's a photo my friend Jennifer took of Ironman 3 being filmed:

film_ironman

With my curiosity in full gear, I set about finding a list of the NC movies, so I could graph the data. Luckily, I found the ncfilm.com website, which provided a good basis for my data. I saved their html code, deleted out unwanted lines, did a bit of hand-editing, and then wrote some SAS code to parse the name, year, and IMDb links out of their list. I used SAS/Graph Proc Gchart to create the bar chart, and got a little tricky with my SAS code so I could color certain movies red so they'd stand out. Click my chart below to see the interactive version, where each box has hover-text with the movie name, and drilldown to the IMDb page.

ncfilm

 


//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////


 

 The Photo Album Section:

If you live in NC, it's possible you will be in a movie, and probable you will know somebody (or know somebody that knows somebody) that's been in a movie!  Below are some examples from my friends. Some of these are a bit gory, but rest assured that no friends were harmed in the making of these pictures! :-)
Read More »

Post a Comment

Tracking the increase in marijuana's THC content

After the legalization of recreational marijuana use in Colorado in 2012, it has been a much more frequent news topic than before - even from a data analysis perspective...

I was recently looking for 'interesting' data to analyze with SAS, and I noticed some articles about the increasing potency of marijuana in recent years. I followed the data 'upstream' and found an interesting report from the Drug Enforcement Administration (DEA). And on p. 27 they showed the following graph:

marijuana_potency_original

Their graph tells an interesting story about how the amount of THC in marijuana has more than doubled in recent years. But the graph is somewhat painful to look at, and difficult to read. Here are a few of the problems that jump out at me:

  • It's difficult to know exactly what point along the line the pointlabels refer to.
  • There are 27 %-signs in the graph, which seems an excessive use of ink & space.
  • The y-axis needlessly shows 2 decimal places.
  • The x-axis has staggered year labels.
  • The year labels are staggered in the opposite up/down from the line pointlabels.
  • The graph doesn't mention marijuana (you have to read the article to intuit that).

Well, of course it might be considered rude to point out flaws in a graph, without going to the effort to produce an improved version ... So here's my SAS version! I think it's a lot cleaner, and easier to read.

marijuana_potency

Which of my changes do you like, and which do you not like? What other changes would you recommend?

 

Post a Comment

Visualizing the eradication of smallpox

Smallpox was declared eradicated in 1979, after an extensive vaccination campaign in the 19th and 20th centuries. This blog post contains a visual analysis of the final years of this disease in the US ...

In my previous blog post, I imitated and improved infectious disease graphs from a recent Wall Street Journal article. I focused mainly on measles in that post - I now focus on smallpox. Here's my calendar chart of the smallpox data from the Tycho website.

smallpox_incidence
Read More »

Post a Comment

6 questions with forecasting expert Charlie Chase

Charlie Chase

Charlie Chase

Charlie Chase is considered an expert in sales forecasting, market response modeling, econometrics and supply chain management. Now he's sharing some of his expertise in his Business Knowledge Series (BKS) course, Best Practices in Demand-Driven Forecasting. I had the chance to ask him some questions about his course and the state of the forecasting industry.

  1. What do you think has been the biggest advancement in forecasting over the last 10 years?

[CC]: Data collection, storage and processing capabilities along with large scale automatic forecasting technology providing the capability to automatically forecast up/down a business hierarchy for hundreds of thousands of products.

  1. If you could forecast (so to speak) how you think forecasting will evolve over the next 10 years, what do you predict will change?

[CC]: Predictive analytics will take center stage supporting demand sensing and shaping utilizing both structured and unstructured data. A new position entitled “demand analyst” will supplement demand planners with analytics and will become standard practice across all industries. Companies will create analytics centers of excellence supporting not only demand forecasting and planning, but all facets of the company’s analytical needs.  Multi-Tiered Causal Analysis (MTCA) will be common practice for those companies who have access to POS/Syndicated Scanner data to improve forecast accuracy.

  1. What’s the biggest mistake forecasters make and how can they fix it and learn from it?

[CC]: Business knowledge alone is not enough to become a good forecaster. Forecasting requires two key things, 1) analytics, and 2) domain knowledge, not “gut feeling” judgment.  Forecasters need to supplement their business knowledge with analytics by taking classes at local universities, attending business forecasting workshops (SAS BKS Workshops), attend business forecasting conferences, and get certified as a “Certified Professional Forecaster” through the Institute of Business Forecasting.

  1. What’s the best advice you can give forecasters?

[CC]: Continue to develop your skills, knowledge, and domain experience. This also includes developing your communication skills and span of knowledge across the supply chain, which includes the commercial side (sales and marketing) of the business.  The future of demand management will be the ability to support sales and marketing with analytics to supplement and enhance the demand-driven forecasting and planning process.

  1. You created a new Business Knowledge Series course, Best Practices in Demand-Driven Forecasting. Why did you create the course and who can benefit from taking it?

[CC]: I created this BKS course to share my knowledge of demand-driven forecasting best practices based on my past experiences, and provide practitioners with a framework to implement a demand-driven forecasting process. Most forecasting courses only focus on algorithms and proofs with little attention to applying analytics and domain knowledge.  This BKS course focuses on applying, interpreting, and implementing statistical methods using domain knowledge.

It's designed for demand Forecasting analysts/planners, demand forecasting and planning directors/managers, marketing analysts/planners/managers/directors, and supply chain analysts/planners/managers/directors, as well as financial planners/managers.

  1. Can you share any tips from the course?

[CC]: The course focuses on the demand-driven process, analytics, and enabling technology with emphasis on applying different statistical methods, interpreting the results, and applying the appropriate methods that will give the best results. There will be no programming with code.

Learn more and sign up for the course - Best Practices in Demand-Driven Forecasting

Post a Comment

How to make infectious diseases look better

The Wall Street Journal recently published some graphs about seven infectious diseases, and I tried using SAS to improve the graphs ... it's a veritable infectious disease (graph) bake-off!

Let's start with Measles ... here's a screen-capture of WSJ's measles graph:

measles_wsj

In general, their graph is eye-catching, and I learned a lot (in general) about the data by looking at it. But upon studying the graph a little deeper, I noticed several problems:

  • It is difficult to distinguish zero values (very light blue) from missing-data (light gray).
  • There was not enough room for all the state values along the left edge, so they just left out about 1/2 of the state labels.
  • When you hover your mouse over the colored blocks to see the hover-text, the box turns light blue - which could mislead you to think that is the data-color of the box.
  • Although it is explained in the introductory paragraph, the graphs themselves don't mention that the unit of measurement is cases per 100k people per year.
  • They don't keep their graphical unit polygons square, but rather stretch them out to fill the entire page - this makes it difficult to know how many years the graph spans.
  • It is sometimes difficult to quickly determine which color represents more/less disease cases, using the semi ~rainbow color scale (especially in the yellow/green/blue end of the scale).

Therefore I set about creating my own SAS version of the graphic, to see if I could do a better job. I located the data on the Tycho website, downloaded the csv files, and imported them into SAS datasets. I then created a rectangular polygon for each block in the graphic, so I could plot them using Proc Gmap. I assigned custom color bins, so I could control exactly which ranges of values were mapped to which gradient shade of my color (I used shades of a single color, rather than multiple/rainbow colors), and used a hash-pattern for the missing values. Here's my measles graph:

measles_incidence

In addition to the visual aspects of the graph, I also made a slight change to the way the data is summarized. The Tycho data was provided as weekly number of cases per 100k people, and (it appears) WSJ summed those weekly numbers to get the annual number they plotted. But the data contains a lot of 'missing' values, and the Tycho faq page specifically mentions that 'missing' values are different from a value of zero ...

"The '-' value indicates that there is no data for that particular week, disease, and location. The '0' value indicates a report of zero cases or deaths for that particular week, disease, and location."

If you have 11 weeks with 'missing' data (such as Alabama in 1932), and you simply sum the other 41 weeks that do have data, and call that the annual rate ... I'm thinking that probably under-reports the true annual value somewhat(?) Therefore, in hopes of getting a more valid value to plot, I calculate the weekly average (rather than the yearly sum).

Here are links to my plots for all 7 diseases:  Measles, Hepatitis A, Mumps, Pertussis, Polio, Rubella, and Smallpox.

So, now for the big question ... have you ever had any of these diseases?

 

Post a Comment

Have a traditional SAS/Graph Valentine's Day!

Nobody puts an arrow through a heart any better than Sam Cooke & Cupid ... but SAS/Graph comes close!

If you've been following my blog, you know that my favorite of all the SAS Procedures are the traditional SAS/Graph Procs, such as GPlot and GMap. They're rock-solid reliable, and flexible enough that you can create just about any graphic visualization that you can imagine.

Therefore I've created a special Valentine's example, of a traditional heart, using traditional SAS/Graph procs - hopefully SAS/Graph has not only put an arrow through these hearts, but one through yours as well!

valentines_graph

 

valentines_graph1

 

And for a special Valentine's treat, click either heart above to go to the interactive version, and then click the red heart to see other SAS Valentine's blogs.  Also, here's the SAS code if you'd like to see how this example was created.

 

Post a Comment

Everything’s bigger in Texas, so … SAS Education offers HUGE training savings!

SASGlobalForum2015SAS Global Forum brings together the most die-hard SAS users, both veteran and novice, once a year. It’s one of those can’t-miss events, and each year it just gets better.

2015 will bring us all together in Dallas, Texas for several days of active learning and excitement from SAS users and subject matter experts. But the learning doesn’t end on April 29 (or start the 26th, for that matter).

SAS Education is once again bringing training to the table. For the conference event, we’re offering training in our Dallas training center, an easy trip from downtown, at a 40% discount for SAS Global Forum attendees. Choose from SAS Programming 3 before the conference and SAS Macro Language 2 after.

Post-conference training will be held at the Kay Bailey Hutchison Convention Center in the heart of downtown Dallas. Texas does big – but these savings are HUGE! Check out the courses being offered at $399 a day … and then register. These seats will fill fast.

April 30 – May 1

  • Developing Custom Tasks for SAS® Enterprise Guide®
  • DS2 Programming: Essentials
  • Introduction to SAS® and Hadoop
  • SAS® Visual Statistics: Interactive Model Building

April 30

  • SAS® Visual Analytics: Getting Started
  • Working with Process Jobs

Ready to get certified? This is a great time to really bring it home – we’re offering a Certification event on both Saturday and Sunday prior to the SAS Global Forum festivities. All exams are being offered at 1 p.m. Saturday, and most are also offered at 9 a.m. Sunday morning.

So go BIG before you go home, and take some large SAS knowledge back with you!

Post a Comment