Using Big Data & Analytics to Fight Fraud!


It is estimated that a typical organization loses 5% of its revenues to fraud each year (  The total cost of insurance fraud (non-health insurance) in the US is estimated to be more than $40 billion per year (  The advent of Big Data & Analytics has provided new and powerful tools to fight fraud.  In my new book, Analytics in a Big Data World, I discuss fraud detection as one important application area.  Furthermore, I have also recently partnered with SAS to develop a new course on the topic of Fraud Analytics using Supervised, Unsupervised and Social Network Methods.

What are the current challenges in fraud detection?

The first challenge is finding the right data.  Analytical models need data and in a fraud detection setting this is not always that evident.  Collected fraud data are often very skew, with typically less than 1% fraudsters which seriously complicates the detection task.  Also the asymmetric costs of missing fraud versus harassing non-fraudulent customers represent important model difficulties.  Furthermore, fraudsters try to constantly outperform the analytical models such that these models should be permanently monitored and re-configured on an ongoing basis.

What analytical approaches are being used to tackle fraud?

Most of the fraud detection models in use nowadays are expert based models.  When data becomes available, one can start doing analytics.  A first approach is supervised learning which analyses a labelled data set of historically observed fraud behavior.  It can be used to both predict fraud as well as the amount thereof.  Unsupervised learning starts from an unlabeled data set and performs anomaly detection.  Finally, Social network learning analyses fraud behavior in networks of linked entities.  Throughout my research, I have found this approach to be superior to all others!

What are the key characteristics of successful analytical models for fraud detection?

A successful analytical model should first possess a good statistical accuracy in terms of hit rate.  It should detect as many as possible of the fraudsters.  Besides this, analytical models should be interpretable.  By understanding the fraud patterns, we can start developing new fraud prevention strategies.  Finally, the models should also be operationally efficient.  This is especially relevant in, e.g., a credit card fraud setting where a fraud decision needs to be made in a few seconds.

For more information about this topic, I am happy to refer to my new book Analytics in a Big Data World.  I also teach a new course on the topic.

For an interview with me and my PhD student Véronique van Vlasselaer working on social networks for fraud detection, watch this video:

You can read more about my work here


About Author

Bart Baesens

Data Analytics Consultant

Bart Baesens is an associate professor at KU Leuven (Belgium) and a lecturer at the University of Southampton (United Kingdom), as well as an internationally known data analytics consultant. He is a foremost researcher in the areas of web analytics, customer relationship management, and fraud detection. His findings have been published in well-known international journals including Machine Learning and Management Science. Baesens is also co-author of the book Credit Risk Management: Basic Concepts.

Comments are closed.

Back to Top