Fraud detection presents myriad analytical challenges: gathering sufficient known cases to make typical modeling techniques possible, gathering inputs from disparate data sources, and combining expert knowledge from investigators with findings to be gleaned from the data in an efficient way. Of course, analysts can fall into the trap of thinking that the analytical challenges are the only problems associated with fraud detection, but that is clearly not the case.
One problem is a war of words. With no analytics background, the following sets of words sound like synonyms: forecast, prediction, estimate; rank and priority; error, misclassification, false positive. Likewise, an analyst new to fraud might not recognize the distinction between $5000 in fraud, $5000 in abuse, and $5000 in waste. After all, in a perfect world, those dollars would never have been paid out, possibly never to return -- what could it possibly matter what the label said? In a database, rows are rows are rows; by any other name, would improper payments smell more sweet?
It turns out that names are critically important, whether an analyst believes it or not.
In certain contexts "fraud" means that laws have been broken, and therefore justice will be sought in a court of law. If the case is not going to court, it could be "waste" or "abuse" but it is ineligible for the "fraud" label. Moreover, policies could be put in place to screen abusive or wasteful activity, but until a crime has been committed, there can be no fraud to detect (an absence of actual fraud would be a great problem to have, of course). Similarly, analysts who build forecasts typically have different data requirements and different objectives than those who are creating predictions; to an analyst it would sound awkward to describe the output of a predictive model as a "forecast."
Ultimately, then, the use of analytical solutions in fraud detection is not simply throwing a bunch of math at a bunch of anomalous activity and seeing what sticks (or sticks out). Actual fraud detection must be a reaction to a loss or some sort of improper access. If it has to be identified after the fact then we would prefer for it to be as soon as possible after the fact. If it's a fraudulent application for a refund, can we stop payment on the refund check? Can we stop the check from being sent? Can we stop the check from being printed? If it's a fraudulent injury claim, can we limit payments for unneeded repairs or therapy? Can we eliminate them at the first notice of loss?
It takes a lot of data from a lot of sources to effectively use analytical tools to reduce the impact of fraudulent activity. This "big data", and the hardware and the software to use big data, can mean the difference between improper payments (that need to be chased by investigations, audits, and lawsuits) and a healthy bottom line. Analysts may be accustomed to the strict, technical definition of their jargon. Subject matter experts are certainly used to specific definitions from their lexicon as well. But we all need to be careful what we say. Calling an analytical risk tool a “predictive” model for “fraud” might sound reasonable to an analyst, but the business team (if not the legal team) may want to shut down any project that has that kind of language in it. Words have power.