“Wise Enterprise: Best Practices for Managing Predictive Analytics” was the title, and the assignment to the panel at the recent Predictive Analytics World conference in New York was to share “poignant moments of failure.” Wayne Thompson from SAS began, going back ten years to describe a network intrusion project. He was feeling proud of the cool, new approach he had used to build a role induction model. Wayne felt even happier when it fit so well in training, so the fact that it was lousy in production was quite a surprise. Wayne’s pointers on how to fail included:
- Generalization. Don’t consider the temporal effect of data changing over time. Today Wayne would have tried other techniques like survival data mining, experimental design, time series data mining, or even similarity analysis from voice recognition to look for co-occurrence.
- Domain expertise. Making no effort to include people who understand the business problem is a recipe for failure. In those days Wayne and team had limited experience with fraud.
- Model management and monitoring. Ignore it. Now SAS has these capabilities, which would have informed him of the problems earlier, but didn’t at the time.
Wayne offered a final way to fail - stumble over the size of your data. Customers often ask for help sizing data because they think it is too big. With the advances in high performance computing the problem is not too much data, “It’s that you don’t have the right analytics environment,” as Oliver Schabenberger recently explained at A2011.
Colin Shearer from IBM told of a dramatic failure that resulted in an unnecessary police raid on a state-funded kindergarten! Years ago this project to uncover fraudulent state-subsidized activity merely discovered an outlier, the biggest kindergarten in the country. Size made their behavior look unusual but it wasn’t fraudulent. What does Colin advise to help you fail?
- Treat data mining projects as science and just hand the projects to the scientists to come up with an answer. If their fraud project had involved someone with domain knowledge they might have understood that the daycare was simply big.
- Just look for something interesting in the data rather than focus on the business objective. Too often he’s seen modelers handed a big pile of data and sent on a hunt without knowing what they are supposed to actually find.
- Stop when you’ve built a good model. The error here is assuming that the model alone will solve the problem, but the challenge is to turn analysis into decisions, which is why Colin emphasized the importance of working backward from the goal to ensure it is achieved.
Dean Abbott of Abbott Analytics said fraud projects had tripped him up as well and offered his own top four ways to fail:
- Ignore bias in the data. In his experience, the infrastructure of the organization is dependent on business values, so the available data may serve the system’s needs and not what will help identify the fraud.
- Focus on specifying the target variable alone. Reiterating Colin’s advice, Dean said he’d seen modelers get stuck on the target variable and forget what they are trying to model. It is critical to get close to the business objective.
- Don’t worry about project management. He remembers a project where the “geeks” built a great decision tree that no one else knew how to run, a classic case of failure to manage the project start to finish.
- Don’t worry about deployment. If you don’t plan on how to use the model or something changes you can be almost assured of failure. It is critical to consider the architecture in advance – how will it be used, what is available, etc.
So with the handy lists above you are free to head out on our own Mission to Fail. Or do the opposite and increase your chances of success.