A practical guide to tackle auto insurance fraud - Part 1: Data Management & Data Quality


Welcome to the 1st practical step for tackling auto insurance fraud with analytics. It is obvious why our first stop relates with data, the idiom “the devil is in the details” can easily be applied in the insurance fraud sector as “the devil is in the data”. This article analyses the data areas, the data quality concerns and data cleansing techniques, the data model needs and data integration approach.

Data Areas of Interest

When we are searching for suspicious fraud activity and indications in the claims fraud process for auto insurance fraud, we do not only utilize claims data, but many more. We can briefly identify 12 data areas as below:

insurance fraud

It is important we structure our data in specific areas and keep data history. Several alerts can be triggered based on historical data either by checking vehicle’s claims history or drivers or claimants or participants. At the same time important entity identification data elements can play a significant role to uncover hidden relationships between connected parties or even organized fraud rings. Such data elements can be bank accounts, tax ids and names, to telephone numbers and addresses. We should not also forget the importance of several claim data flags, that claim officers professionals utilize via a traditional approach with great results, e.g. damage without collision, total damage, theft, damages from physical phenomenon, police attended the scene, scene not related with claim description and many others.

Usual Data Quality Issues

Insurance business has a diversity of channels for data input, from web applications for agents and brokers, core insurance systems, direct insurance web sites, claim announcement applications and many others. The operational business agility and speed comes first and data quality many times comes in a second priority. It is common insurers to face data quality issues, in matter of completeness and standardization, in all above data areas, e.g.:

  • Data Quality issues in Individuals data (customer, driver, participant, claimant etc.): telephone numbers, addresses names, IDs, names etc.
  • Vehicle information like VIN, IDs
  • Claim data like type, suspicious fraud indicator flags, dates, estimated value
  • Suppliers / Body shops missing structured lists with needed data (telephone numbers, addresses etc.)
  • Payment data like bank accounts, bank account holder etc.

Data cleansing and matching techniques

In order to resolve data quality issues, you have to use specific data quality software and techniques as the below:

  • Data profiling for analyzing current data quality status and plan data quality actions accordingly (e.g. identify missing data, outliers, pattern recognition, frequency per pattern, referential integrity, standardization issues etc).
  • Data cleansing for correcting mistyped data (e.g. numbers including characters or vice versa, removing special characters from telephone numbers or name fields etc).
  • Data standardization (e.g. a unique pattern for telephone numbers, a unique break down of address elements in data fields, name and surname in the proper separate fields etc).
  • Data enrichment for fulfilling missing data (e.g. identify and update postal codes).
  • Entity resolution for matching of various entities in order to uncover hidden relationships between different claims and individuals (customers, policy holders, claimants, drivers, participants, telephone numbers, addresses, suppliers, brokers etc.). Fuzzy matching techniques in order to identify and score the level of similarity between text or numeric elements and phonetic algorithms in order to identify the similarity of text elements without the impact of grammar typing errors, are very frequent utilized and provide significant matching results.

The data quality implementation process have to follow a structured methodological steps, starting from definition of data sources and semantics, proceeding to data profiling, design of data quality business rules, embed the process to the operation data management and integration process, monitor results and improve system and processes when needed.


Figure 2: SAS Data Quality Implementation Methodology ®

Data Integration and Specific Claims Fraud Data Model

Upon having cleansed and standardized data, insurers have to structure these data in a specialized for claims fraud insurance data model and automate their ETL processes (Extract, Transform, Load). Both actions are important:

  • The claims fraud insurance data model will secure the data consistency, taking into account relational database and referential integrity principals, having data historization capabilities and valid from-to record identifiers.
  • The data integration mechanisms have to secure not only the automation of the ETL processes, but also ongoing maintenance and future modifications. For this reason, it is mandatory to use a modern data integration solution in order to document graphically every data extract, data transformation and data loading process, supporting impact analysis and reverse impact analysis. Also, you can easily assess, design and performe any modification due to a change to source data structures or other.

Key message for auto insurance fraud

It is major importance the insurers deal with data management and tackle data quality issues in the first step of a fraud detection analytics journey. Many times this looks time consuming, many times it looks that there is no progress in the real project which is fraud detection and alert generation and not data management and data storage, but this is more than a lie. The quicker way an insurer can have tangible results and value through an analytics journey to fraud prevention, is tackling data quality first. In this way, you will false positive alerts, increase claims fraud suspicious and detection rates. In addition, claim officers and fraud investigators will trust the analytics results and utilize effectively the fraud analytics system findings.

This is the 1st post in a 7-post series, “A practical guide to tackle auto insurance fraud”. This series explores 7 analytics best practices techniques that insurers need to follow for tackling auto insurance claims fraud.  Next post goes deeper to Business Rules and Watch list techniques, for tackling known fraud types, and relevant fraud detection value that an insurer can gain. Also, here you can request the on demand version of the insurance fraud webinar series that were recently completed and learn more on how to prevent insurance fraud in the digital age & what are the steps for implementing an anti-fraud solution in your organization successfully.


About Author

Stavros Stavrinoudakis

Professional Services Senior Manager

Stavros Stavrinoudakis is the Professional Services and Presales Senior Manager for SAS Greece-Cyprus-Bulgaria. Stavros is an Information Technology senior professional since mid 90s, with great experience in the Business Intelligence and Analytics areas. Having previous roles of CIO, MIS Manager, Member of Strategic IT councils, Business Intelligence Consultant and working either in the software vendor side or inside large scale organizations, he has developed strong expertise in design and implementation of innovative Business Applications. His managerial skills are focused in team leadership for inspiring teams to achieve any target. His background and SAS challenges drives him to exponentially expand his expertise across industries and business pains, as the area of Fraud Prevention, being a SAS Social Media Spearhead for EMEA.


  1. Pingback: A practical guide for auto insurance fraud – Opening Welcome - Bright Da

  2. Pingback: A practical guide to tackle auto insurance fraud - Social Network Analytics

  3. Pingback: A practical guide to tackle auto insurance fraud - Hybrid Scoring

  4. Pingback: A practical guide to tackle auto insurance fraud - Part 6: Fraud Investigator interface

Leave A Reply

Back to Top