Tracking Ebola: Using SAS bubble maps

In a previous blog post, I showed how to layer colored areas on a SAS map to show both countries, and the areas within countries that had cases of Ebola. But as the Ebola epidemic has spread, more data has become available, and in this blog I show how to represent that additional data by annotating bubble markers on the map.

While perusing the Internet for the latest Ebola information, I came across a map that the World Health Organization maintains.  It is interactive, and lets you pan and zoom the map. And as you zoom, it shows progressively more detail. It's a pretty cool map. After zooming-in, I did a screen-capture to include below:

ebola_bubblemap_2014_original

But as I studied the map to try to determine the current status of Ebola, I noticed a few problems. The shades of color for the land were very similar to the colors of the bubbles, even though they were representing slightly different things. Also, the colors used were transparent, and therefore when they overlapped (such as bubbles overlapping other bubbles, or overlapping land) then the transparent colors combined and produced darker colors -- which made it very difficult to determine which legend colors they matched.

Therefore I decided to create my own SAS map, and use solid/non-transparent colors (so there was no color blending), and also use very different/distinct colors for the land and the bubbles (so it is easier to match up to the legend).

For the main bubbles, I used the SAS %centroid() macro to determine the location of the center of each region in the map, and I then used the annotate pie function to draw the bubbles.

I used specific lat/long coordinates for the locations of the Ebola Treatment Centers, and overlaid several pieces of geometry (all using variations of the annotate pie function) to create a white bubble with a red cross.

Below is a snapshot of my SAS version of the map. Click here to see the interactive version, which has html hover-text over all the land areas and bubbles, and drill-down for the Ebola Treatment Centers.

ebola_bubblemap_2014

Here is a link to the SAS code, if you'd like to experiment with the map.

Post a Comment

What can universities do to fill the analytics skills gap?

Who do you want to be when you grow up? And can I offer you a suggestion?

Considering the huge shortfall in analytical talent we’re facing, we should be asking those two questions more often. Many of the people who are considering a change in careers or searching for their first career don’t realize there are rewarding – even lucrative – career opportunities in data analytics. What can we do to help them see themselves as data pros?

Dr. Michael Rappa is the founding director of the Institute for Advanced Analytics and a professor in the Department of Computer Science at North Carolina State University. Jennifer Priestly is a Professor of Applied Statistics and Data Science and the Director of the Center for Statistics and Analytical Services at Kennesaw State University. Both directors have strong opinions on how to provide the market with the type of graduates who can tackle the big data problems that change the world. (Sound hokey? Then check out Project Data Sphere and UN Global Pulse.)

Filling the talent gap isn’t going to happen overnight, but with a few purposeful steps we can make it happen. Here are a few of Priestly and Rappa’s recommendations:

  1. Teach differently. Priestly says universities need to innovate just as the private sector is. “We can’t teach the way we have always taught,” she says. “Universities need to use the new resources and tools and change their thinking to fit this generation of problems and this generation of learners.”
  2. Make the path clear. Universities can no longer assume that students will self-select courses that will help them match up to employer expectations. “We have to purposefully direct that path by changing course offerings, educate students on the opportunities and incentivize them to pursue that career.” 9North Carolina State University (NCSU) was the first to create an M.S. in Analytics.) Rappa says NCSU stopped thinking in terms of individual courses and developed a 10-month, full-time intensive curriculum. The cohort moves through the program together and learns as much from one another as from the course work.
  3. Look at the employer as the customer. In one of my undergraduate marketing courses, we were asked to research what universities could do to increase enrollment and improve student success. Our survey results were limited to a small segment of the student body, but it seemed conclusive that universities should treat the student and prospective students as the customer. Rappa disagrees (and I do now that I’m in the job force). He says universities will set the students up for success in the workforce by asking employers what kind of graduate will help them move their organization forward.
  4. Create team players. According to Rappa, the traditional response when an employer says that graduates are missing a skill set is to develop a course that teaches the skill. But with teamwork, that’s not really a skill you can learn without living it. “If you want team players, students need to work in teams,” says Rappa. He says that they are successful because they work together to solve a real problem from sponsoring organizations’ real data.
  5. Develop lifetime learners. Employers expect applicants to have knowledge of the current analytics technology, but they don’t expect or want them to show up on the first day knowing everything there is to know about analytics. What they want is someone who is curious and hungry for more problems to solve. Universities need to produce someone who can be productive from the start. You can do that by giving them interesting and challenging projects to work on – give a purpose to their learning.
  6. Partner with corporations to gain real – even messy – data. The answers to real-world problems can’t be found in the back of the textbook. Priestly says that universities should call on organizations to sponsor contests or give students the kind of data sets they can expect to see in their career.
  7. On-the-job training. Provide internship opportunities that introduce the students to real-world uses of analytics. Assign them to projects that capitalize on their creativity. Priestly says to encourage organizations to bring students a problem to solve. “It’s a heck of a lot cheaper than hiring a consultant firm and while you’re at it you will help create a pipeline of experienced talent,” she says.

Rappa’s message to universities, “Be bold. Break free of your comfort zones to educate students in powerful ways that justify the kinds of loans students take out ….  The message to industry is push back on universities in your area. Incentivize them to break free and offer new programs.”

Post a Comment

Is trusting your gut the right way to go?

Jay Liebowitz speaks at A2014

Jay Liebowitz speaks at A2014

When I was young, the use of analytics wasn’t widespread – even in very large companies. Organizations relied on their leaders’ experience built on years in the industry. The more experience and knowledge a leader had, the better the decisions they made and the more successful the business was. The introduction of business intelligence and predictive analytics technologies has triggered a shift toward data-driven decision making. That’s a good thing and a bad thing.

Often, basing your decisions on what the data says can be the safest route. But, Jay Liebowitz says, we still need to include our intuition as part of the decision-making process.

Liebowitz is the Orkand Endowed Chair in Management and Technology in the Graduate School at the University of Maryland University College (UMUC). He’s written several books about big data, analytics and decision making. Most recently, Liebowitz published “Bursting the Big Data Bubble: The Case for Intuition-Based Decision Making.”

Liebowitz recommends a balance of trusting your gut reactions without being overconfident. Use analytical techniques to validate or disprove your gut reaction, he says, and then learn from the exercise. He spoke Monday at the Analytics 2014 conference sponsored by SAS.

Trust your gut?

In  the journal article, “When Should I Trust My Gut?” Erik Dane and his associates found that intuition is often as good as analytics if you are very experienced in the domain where you are making the decisions. Liebowitz agrees and warns that the current trend to constrain employee hiring costs by cross-training employees can mean that key employees don’t develop the expertise they’ll need to make sound judgments.

Liebowitz went on to quote an MIT Sloan Management review article describing the value of intuition over statistical analysis. “For many complex decisions, all the data in the world can’t trump the lifetime’s worth of expertise that informs one’s gut feeling, instinct, or intuition.” Leibowitz says gut instinct can be taught, but that it requires time. An example he uses to illustrate the value of intuition in decision making is the career of Wayne Gretsky. Gretsky has been called the smartest hockey player ever. He defined the game for generations to come because of his uncanny sense of where the puck would be and where his team mates were on the rink.

In his autobiography he writes, “Some say I have a 'sixth sense' . . . Baloney. I've just learned to guess what's going to happen next. It's anticipation. It's not God-given, it's Wally-given. He used to stand on the blue line and say to me, 'Watch, this is how everybody else does it.' Then he'd shoot a puck along the boards and into the corner and then go chasing after it. Then he'd come back and say, 'Now, this is how the smart player does it.' He'd shoot it into the corner again, only this time he cut across to the other side and picked it up over there. Who says anticipation can't be taught?”

Of course, Liebowitz doesn’t discount the value of analytics. To the contrary – he believes decision makers should rely on their expertise, but then prove or disprove it based on the data.

Downside to gut reaction?

Another article on cfo.com about big data says, “We generally have good intuition about things that are similar to what we encounter every day … but we have poor intuition about things that are outside of the everyday.” That makes sense – but what about the times you have to make a decision quickly and you don’t have the benefit of analytics? Think about all of the possibilities. For example: Gary Klein uses a ‘pre-mortem’ technique where he critically evaluates the worst possible outcomes of his decision based on all of the information available at the time.

What can you do to improve your decision making from a business intuition perspective?

  • Respect your intuition without rejecting it outright or following it blindly.
  • Ask yourself what prompted your gut reaction.
  • Review the evidence.
  • Elicit good feedback from other experts.
  • Prove or disprove your hunch (this is a good place for analytics).

For more about Liebowitz’ theories about intuition versus analytics, read, “Educating informed 'intuitants.'” In this SAS Insights article, Liebowitz discusses the new UMUC online M.S. in Data Analytics degree where up-and-coming leaders are taught “basic and advanced skills to support strategic and tactical decision making in the new big data world.”

Hear more from Liebowitz on this topics in this Inside Analytics video:

Post a Comment

Big Data @ Work: a conversation on Twitter

TDav2Last month Tom Davenport, renowned international speaker and author of 'Big Data @ Work' came to Dublin as a guest of SAS Ireland. During the event, there was a lively conversation on Twitter, with many great questions answered by John Farrelly and Alan Gormley from SAS Ireland. Here are some of the highlights.

Blog post compiled by Phil Male from SAS UK and Lauren Brennan from SAS Ireland.

Post a Comment

How cheap will gasoline prices go?

Have you noticed lower gasoline prices lately? How low will they go, and how long will they stay down? Let's use SAS to analyze some of the data!...

gas_price

First, let's look at just the price of gasoline over time. Here's a plot of the US average gasoline price, each week since year 2000. I use a very tiny bar (needle) to represent the price of each week, and change to a darker color at each 50-cent price increase. Notice that (in general) the price drops every year in the fall, and stays lower through the winter. Perhaps this is because people travel less, and there is less demand(?) What other factors do you think cause the price drop each fall?

gasoline_prices_plot

Gasoline if made from oil, and of course the price of gasoline is very related to the price of oil.  Saudi Arabia recently hinted that the price of oil might be going down to $80/barrel. I created two graphs where I plot the price of oil and gasoline, so you can visually compare them side-by-side.

oil_prices1

oil_prices

 

The above two graphs definitely seem to indicate there's a correlation, but I wanted a way to visualize this correlation a bit more directly. Therefore I created a scatter plot with the price of oil on one axis and the price of gasoline on the other, and let SAS calculate a regression line through the data. The data points follow the line fairly closely.

oil_gas_correlation

 

Enough about graphs & analytics ... what's the lowest price you've paid for gasoline this fall? How low do you think the price will go, and how long will the price stay down? (Leave your reply in a comment!)

Post a Comment

SAS Certification at Analytics 2014

Analytics2014Attending an industry conference requires an investment in time away from the office and maximizing that investment makes a lot of sense.  In addition to gaining insight into challenges facing the analytics industry today,  discovering and evaluating new products and services, and networking with the largest gathering of analytics professionals in the world, attendees at Analytics 2014 in Las Vegas are also taking advantage of workshops, training classes, and certification exam sessions to  accelerate their personal development.

On Sunday, October 19, all public SAS exams were offered at the Bellagio, host hotel for the conference, and forty four candidates participated in the testing session.  As you would expect at an Analytics conference, Predictive Modeling was the most popular exam, but candidates also took other exams ranging from Base Programming to Statistical Business Analyst Using SAS.  We don’t disclose specifics about pass rates for exams, but I must say that this group of candidates were among the most motivated and well-prepared that we have seen.  There are quite a few brand new SAS Certified Professionals in Las Vegas today.

If you missed out on becoming SAS Certified at Analytics 2014 in Las Vegas, you may be able to take advantage of exam sessions at other major SAS conferences.  If you plan on attending the 2015 SAS Global Forum, April 26-29 in Dallas, Texas, be on the lookout for certification exam opportunities.  We usually discount the price for conference attendees.

Post a Comment

Higher education and analytics

It's my favorite time of year! The leaves are changing. Football is back. And it's also time for our annual Analytics conference.

One of the best parts about my job is getting to attend the conference each year and host the Inside Analytics video series.

Not everyone at the conference gets the chance to have one-to-one time with so many speakers.

My first interview was with Dr. Goutam Chakraborty, professor of marketing at Oklahoma State University.

 

If you're at the conference this week, here's my list of the six things you need to do.

Look for more updates this week. The conference runs from Oct. 20-21.

Post a Comment

How do men rate women on dating websites? (Part 2)

I always recommend looking at data in several different ways, to get a more complete picture of what's really going on - such is the case with the member 'ratings' on dating websites. Let's take a look at some data from a different angle...

cupid_angled

In a recent blog post, I analyzed which age men & women the opposite sex rated most attractive. The graphs indicated that men rated 20-year-old women the most attractive, whereas women rated men closer to their own age most attractive. This sparked quite a bit of discussion (such as the comments in the cross-posting of the blog on allanalytics.com).

So I decided to look at the ratings data in a different way - this time ignoring age, and just looking at how men and women rate each other in general. I found some histograms on p. 16 of Christian Rudder's new book Dataclysm that showed almost what I was looking for, and I then used some graphs from his blog to estimate the data so I could create similar charts in SAS.

Whereas the men of all age groups consistently rated 20-year-old women the most attractive (which produced a very lopsided chart), their ratings of all women in general produced a very symmetrical chart. In Rudder's book he even describes it as "close to what's called a symmetric beta distribution - a curve often deployed to model basic unbiased decisions." Therefore it appears that men are very unbiased/honest in the way they rate women.

okc_rating_curve

By comparison, women rated men very poorly. Rudder mentions that women only rate one guy in six as "above average."

okc_rating_curve1

What causes this huge difference in how men and women rate each other? Is one being more honest than the other? Are they rating based on different criteria (perhaps men are rating based on looks, and women are rating based on whether or not they think the men would make a good mate)? Perhaps women are hesitant to rate a man highly, because they know that will trigger okcupid to send that man a message letting them know which woman rated them highly? What other factors are perhaps influencing this data?

Feel free to leave your thoughts & theories on this topic in the comments section!

 

Post a Comment

Is Hadoop the answer to big data?

HadoopHaving spent a quarter of a century working on databases and on database-related technologies, I have developed an aura of skepticism on any new product that hits the market being presented as the best thing we have ever seen. It’s not that I love to revel in “I told you so” moments, it’s just that I have seen too many products fly high in the sky only to disappear like meteors.

For many, Hadoop’s entrance into the database field meant that technology had finally come up with the only possible instrument equipped with a framework capable of handling “big data.” On top of that, its affordability unequivocally meant that the end was in sight for traditional relational databases that had so far dominated the scene. Today, after much time and effort spent on integrating Hadoop in their environments, many of the companies that were quick to jump on its bandwagon are discovering that despite having an important role in their infrastructure, Hadoop is not the Godsend answer than many thought it would be.

Why is that? The explanation is simple. At the end of the day, Hadoop is another technological tool, just like its relational database counterparts. On the other hand, big data is not about technology, but rather about business needs. This means that Hadoop shouldn’t be considered as the sole player in the field of data analysis. For example, it makes sense to use Hadoop to run broad exploratory analysis of large data, but a relational database is still a better option to perform an operational analysis of what was uncovered. Hadoop is also good for looking at the lowest level of detail in a data set, but relational databases make more sense when it comes to storing transformed and aggregated data. As the Facebook analytics Chief Ken Rudin puts it, “you need to use the right technology to fit your business needs.”

A recent survey commissioned by an IT company, found that more than 30% of the companies interviewed had already deployed Hadoop, with an additional 30% having plans to deploy it within 12 months. Something interesting that came out of the survey was the fact that the majority of these companies planned to combine Hadoop’s data analysis capabilities with the ones provided by other databases that were already integrated in the companies infrastructures. According to the study, the goal was and still is to use Hadoop to perform raw data analysis, while using traditional databases to take care of non-analytic workloads, especially transaction-oriented ones, and perform data analysis on aggregated data coming from Hadoop.

Take eBay, for example. The San Jose, Calif.-based company’s three-tier data analytics approach is an example of the kind of role Hadoop can find within an organization alongside other traditional relational databases. Structured data resides in the first tier, an enterprise data warehouse that is used for daily housekeeping items, such as feeding business intelligence dashboards and reports. The second tier consists of a Teradata data management platform that is used to store huge amounts of semi-structured information. Fully unstructured data such as textual information lives in the third tier, a Hadoop cluster reserved for deeper research, analysis and experimentation.

The moral of the story is that Hadoop is not a synonym for big data, but one of the many players you need to mine and analyze your data. A good reason to hang on to those other databases a little longer.

I’ll be talking about big data and Hadoop at Analytics 2014 along with Josh Wills from Cloudera and my SAS colleagues Wayne Thompson and Kelly Hobson. Check out our panel presentation and round table discussion on Hadoop. We hope to see you there!

  • Panel discussion with SAS and Cloudera on Big Data and Hadoop: Moving beyond the hype to realize your analytics strategy with SAS® - Monday, October 20, 3:00-3:50 pm
  • Round Table discussion on Practical Considerations for SAS Analytics in a Hadoop Environment – Tuesday, October 21, 12:30-1:45 pm

You can also check out our starter services on Visual Analytics and Visual Statistics and the Expert Exchange for Hadoop.

Post a Comment

The SAS model factory – a big data solution

Do you have too many models to build, too many to manage, too few analytic resources or too much data?  A Model Factory may be your answer.

The mindset of analytics is changing.  This represents the transformation from a “craftsman” dominated culture in which multiple weeks were spent cycling through data and developing a model; to a production-oriented environment where analytically derived information almost instantaneously follows the strategic conceptualization of ideas.

This transformation is significantly accelerated by the integration of the SAS Model Factory.

The idea of a “Model Factory” may make one reminisce of a mechanical age of smokestacks and assembly lines.  When Henry Ford revolutionized the car making process by introducing the assembly line – the process that is still used worldwide in auto manufacturing today – he laid the foundation for the democratization of the car. This assembly line reduced the cost of making a car to an amount that made it sellable to a much larger audience.

What do we really mean by Model Factory?

A factory is defined as where something is made or assembled quickly and in great quantities.

A model factory is defined as where predictive models are automatically built quickly and in great quantities enabling an automated scoring process.

Why would you use a Model Factory?ModelFactory

  • Perhaps you have limited technical and/or analytic resources.
  • You have too many models to build and manage because you have various target variables and/or you segment your customers prior to modeling.
  • If you have 1000’s of customer attributes, you may need to select only a subset that is appropriate for each model.
  • Perhaps you need to perform repetitive data preparation with variable transformations, handling of missing values, etc.
  • You have Big Data which slows down model building and scoring.
  • In brief, you are unable to build models fast enough.

Can the model factory process be automated?

It consists of:

  • Model Initiation
  • Model Development
  • Model Deployment
  • Model Monitoring
  • Model Recalibration/Rebuild
  • Model Retirement

From a Factory Perspective, it looks like:

sas model factory

You choose to write a code-based Model Factory

You can use Base SAS and SAS/Stat with the High Performance Procedures to enable 100’s or 1000’s of models to be built automatically on as much data as you have.  With the needed code, your data will be structured properly.  Transformations, and missing values will be automatically handled.  Good enough models will be built.  And, no analytical skills will be needed to run the process.

Model Factory Deployment

  • Run Macro Driven Code
  • Parameter file

–      Manual entry
–      Point-and-Click entry

  • Code processes parameter file and data
  • Code runs analytic models
  • Model Factory code produces Scoring code

SAS has other solutions for model building

If you have fewer models to build and/or you have the needed analytic resource for model development, these Point-and-Click solutions may be sufficient:

  • Enterprise Miner
  • Rapid Predictive Modeler – run from Enterprise Guide

What can you do to build a Model Factory?

  • Take classes in Data Mining techniques
  • Read documents about data mining
  • Have internal working meetings to review goals and desired results
  • Engage consultants

In summary, we understand that you have experienced the chaos associated with building and maintaining a multitude of models.  The solution to your modeling problems may be the Model Factory Solution which replaces the chaos with automation, efficiency, and repeatability.  For more information, you may contact the author.  For more on this topic, attend the SAS Model Factory pre-conference workshop at Analytics 2014 in Las Vegas on Sunday, October 19, 2014, 1-5 pm.

Post a Comment