Though this is a busy time of year for SAS’ Education Division with the second A2010 Analytics Conference in Copenhagen and the thirteenth M2010 Data Mining Conference both just recently completed, I had the opportunity to catch up with Bob Lucas, PhD, who oversees statistical training at SAS.
Bob: During my 13 years at SAS, one of the biggest changes I have seen is the growing number of customers from commercial business taking training. When I started at SAS, most customers were from the research community. I think a consequence of this trend is that we face customers taking our training with a much greater diversity of formal training in analytical methods and analytical thinking. In response to this trend, we have diversified our curriculum and continue to add courses to our popular Business Knowledge Series program to have courses focused more on business applications.
Anne: As a seasoned statistician with years of experience before coming to SAS, what are the most noteworthy changes you’ve seen in enabling more and better analysis to take place? Bob: I think that Moore’s law has driven key changes. The availability of cheap, fast computers and cheap memory has fundamentally changed the types of analyses that are being routinely done in both science and business. Software that exploits the tremendous gains in computing power has allowed a much broader group of people to perform simple to fairly complex analyses.
An example is typical insurance claims data. These data are typically zero-inflated with very long tails. Fast computers and the availability of software allow the analyst to fit appropriate nonlinear models where before the analyst would have had to do some approximations or maybe even redefined the problem to get an answer. Now with the availability of software, the analyst can fit the correct model using the appropriate likelihood function.
The availability of visual programming and graphical user interfaces has also led to better analyses. Such interfaces can make it much easier for an analyst to do sophisticated analyses if he or she has adequate training and guidance. Removing the burden of having to learn software syntax, especially for people where analyses is only a part of their job, enables them to perform simple or even complex analyses that they could not have done before.
Anne: What are the most common mistakes you see organizations making when trying to leverage data and analysis to make better decisions? For instance, it seems organizations under-utilize the time dimension of their data…
Bob: More than 10 years ago I remember helping a customer develop their first predictive models. It took three tries to get data from a previous campaign for customers who had not responded. They did not understand that we needed both responders and nonresponders to build a model. Of course, a common mistake of new predictive modelers, who might not fully understand the time dimension issues associated with predictive modeling, is time infidelity or leakage.
I’d say that most customers do not fully exploit the time dimension. Most customers realize that circumstances change over time because their models get stale. Consequently, they often do out-of-time validation or test data sets. When doing predictive modeling, customers often capture the time dimension by building inputs that reflect past behavior for intervals of time prior to the event date. For short-duration situations, such as target marketing to acquire new customers, this is fine. However, for churn or up-sale campaigns or for predicting probability of default, which are natural time-to-event or survival problems, I think it is better to explicitly include the tenure of the customer, time, as an input in the model. The time dimension can be explicitly incorporated into models using survival analyses approaches.
Anne: Can you tell us about how the survival mining course came to be—how some innovative thinking helped frame an old problem in a new way?
Bob: The Survival Data Mining class was written by Will Potts. He began that when he was working on a churn problem for a customer. He recognized that the churn problem was really a time-to-event problem, and he was an expert in doing continuous survival problems.
For churn problems in business, analysts historically would frame this as a classification problem where the time dimension wasn’t fully leveraged. This approach, while common, failed to take into account hazards and outcomes in time. Also, Will recognized that something suitable to score new customers as well as do assessment to ensure models are giving you predictions better than random guesses was needed. Cox proportional hazard models had flexibility, but they don’t work for scoring because they are semi-parametric. Will also knew that fully parametric models like LIFEREG produce scoring equations, but they don’t have the flexibility to fit hazards well. He concluded that a discrete time approach would satisfy all of the needs that that problem had, handling time covariates and time discontinuity. Discrete time methods are what the course is about, and these methods have in turn done well in predicting if and when key customer events would occur. At the F2009 Business Forecasting Conference, Professor Jonathan Crook presented on this topic and endorsed discrete time methods as far superior to previous methods applied.
Anne: Professor Crook also presented at a SAS Day for a large UK bank in 2008 and asked the audience (who were largely not using this approach), “Why wouldn’t you want to know when and not just if a customer would default?!” Bob, since Will no longer teaches this course , you’ve been teaching it around the world. Having guided so many users to adopt a better method, you’ve also provided a lot of good input to the new survival mining node soon to be available in SAS Enterprise Miner and connected R&D with some good development partners. What do you think customers will be most excited about?
Bob: The data manipulation for creating the development data sets can be challenging for many potential users. Making this powerful modeling method available through a UI will make it much easier for users. For experienced programmers, the node will remove much of the tedium associated with developing and validating models. For inexperienced programmers, it will enable them to use the approach. I think for some, the increased complexity of the approach was a barrier to their adoption.
Anne: In addition to the survival mining course, are there other courses you’d like to mention that help frame old problems in new and better ways, for example, Design of Experiments for Direct Marketing.
Bob: When I came to SAS, I was surprised that so few people were doing experiments in business, but the people who were doing experiments were in marketing. And they didn’t do good experiments (just testing one factor at a time) and didn’t know that multi-factor experiments were so much better or how to do them. One large financial services customer wanted a class developed for design of experiments in direct marketing. Some of the real-world examples in the class are a result of that partnership. That was part of the agreement, but, of course, the data were modified to be anonymous.
In the insurance industry, the historical data typically is zero-inflated and is very skewed to the right with relatively very few very large values. Data like this introduces modeling challenges that traditional modeling methods are not designed to accommodate. Two-stage modeling is one approach that is used. Heckman models are an example of this. The two stages are modeling the probability of an event occurring as one stage and modeling the severity or value of the event as the second stage. The results from both stages can be combined in some way.
The modern approach is to model these data with a mixture distribution. Zero-inflated data can be modeled as a mixture of a Bernoulli and a non-negative skewed distribution. The zero-inflated Poisson is an example of a mixture distribution. One of the analyses is to choose the best severity distributions because the extreme values of the severity distribution can lead to very large losses.
Years ago, before we had the SAS Marketing Optimization solution, we developed a customized course for another large financial services firm to score 20 million customers with 20 different models every month. Some people would get 20 offers and some 0; we formulated a linear programming problem to help them make the decisions on constraints and upper bounds on risk, etc. It turned out that attaining the response target for marketing and keeping the risk level as low as desired was infeasible. The solution to this problem is not modeling. The only way to solve this problem is to have Risk and Marketing meet to compromise on their targets.
Anne: Funny, and you raise a good point that many may not be aware of—that customers often provide you with their own data to deliver custom, on-site training. What are some of the more interesting analysis challenges you’ve seen with real-world customer data (without giving away any telling details, of course)?
Bob: We’ve provided mentoring services and customized training for marketing people in pharmaceuticals using IMS data (time series cross-sectional data). We’ve introduced them to PROC MIXED and PROC GLIMMIX (more like count data because of a low-prescription drug). We taught ARIMA model identification to chemical engineers—chemical engineers are really smart and easily pick up on new or different methods when given the necessary background. There are many more examples, I can’t possibly list them all.
We provided training for big-box retailers to understand attrition. To a large bank with data to mimic response rates they should expect. Showing people how to use linear programming to improve the decision process and make campaigns overall more efficient. This led to the SAS Marketing Optimization offering, which is very nicely implemented.
Anne: Care to comment on the new course which debuted at the A2010 Analytics Conference and the M2010 Data Mining Conference on the hot topic of net lift modeling?
Bob: Net lift is the ideal of what you want to do: you want to identify the customers who are more susceptible to promotion, but the data you would need to address directly is to know how a single person would respond if you promoted to him or her and if you didn’t. But that data is impossible to obtain. Mimic—matched pairs in observational studies—try to match people in a stratified way to block the prospect population to treat and not treat to get the data you need. Net weight of evidence calculation in a macro is covered in the course. The course contains a good approach to tackle this problem: to separate out the people who would’ve bought anyway from those who would be receptive to promotions and who would have been unlikely to have bought without the promotion.
Anne: Thanks, Bob, for taking the time to share your thoughts and to share some of the great work taking place in SAS’ Education Division. It’s also really nice to see the Business Knowledge Series continue to grow with such excellent domain experts participating.