A common mistake in data analysis and how to fix it

1

Last month, our Director of Statistical Training, Bob Lucas was interviewed about trends in analytics by Anne Milley, Analytics Product Manager for SAS. The interview has been wildly popular, so I thought I'd share an excerpt here on the SAS Training Post. You can also read the full interview on the sascom voices blog.

Anne: What are the most common mistakes you see organizations making when trying to leverage data and analysis to make better decisions? For instance, it seems organizations under-utilize the time dimension of their data…

Bob Lucas, PhD

Bob: More than 10 years ago I remember helping a customer develop their first predictive models. It took three tries to get data from a previous campaign for customers who had not responded. They did not understand that we needed both responders and nonresponders to build a model. Of course, a common mistake of new predictive modelers, who might not fully understand the time dimension issues associated with predictive modeling, is time infidelity or leakage.

I’d say that most customers do not fully exploit the time dimension. Most customers realize that circumstances change over time because their models get stale. Consequently, they often do out-of-time validation or test data sets. When doing predictive modeling, customers often capture the time dimension by building inputs that reflect past behavior for intervals of time prior to the event date. For short-duration situations, such as target marketing to acquire new customers, this is fine. However, for churn or up-sale campaigns or for predicting probability of default, which are natural time-to-event or survival problems, I think it is better to explicitly include the tenure of the customer, time, as an input in the model. The time dimension can be explicitly incorporated into models using survival analyses approaches.

Anne: Can you tell us about how the survival mining course came to be—how some innovative thinking helped frame an old problem in a new way?

Bob: The Survival Data Mining class was written by Will Potts. He began that when he was working on a churn problem for a customer. He recognized that the churn problem was really a time-to-event problem, and he was an expert in doing continuous survival problems.

For churn problems in business, analysts historically would frame this as a classification problem where the time dimension wasn’t fully leveraged. This approach, while common, failed to take into account hazards and outcomes in time. Also, Will recognized that something suitable to score new customers as well as do assessment to ensure models are giving you predictions better than random guesses was needed. Cox proportional hazard models had flexibility, but they don’t work for scoring because they are semi-parametric. Will also knew that fully parametric models like LIFEREG produce scoring equations, but they don’t have the flexibility to fit hazards well. He concluded that a discrete time approach would satisfy all of the needs that that problem had, handling time covariates and time discontinuity. Discrete time methods are what the course is about, and these methods have in turn done well in predicting if and when key customer events would occur. At the F2009 Business Forecasting Conference, Professor Jonathan Crook presented on this topic and endorsed discrete time methods as far superior to previous methods applied.

Tags
Share

About Author

Michele Reister

Marketing Specialist

Michele Reister has worked in the Education Division at SAS since 2004. During that time she has played many roles including marketing training courses, developing product bundles, managing conferences and overseeing the division’s discount programs. Currently, she is responsible for the division’s social media strategy. Michele holds a BS in Management and Information Technology from Daniel Webster College and an MBA from University of North Carolina at Chapel Hill. Michele is a perpetual student herself and is constantly looking for better ways to serve SAS’ user population. When she’s not expanding her knowledge of marketing, Michele enjoys group fitness classes, cooking, volunteering, reading and chasing after her two children.

Back to Top