“Garbage in, garbage out” is more than a catchphrase – it’s the unfortunate reality in many analytics initiatives. For most analytical applications, the biggest problem lies not in the predictive modeling, but in gathering and preparing data for analysis. When the analytics seems to be underperforming, the problem almost invariably lies in the data.
Insurers typically have multiple legacy transactional systems often for each line of business. Hence when an insurers has an individual with profiles across multiple systems, it needs to be able to identify that as the same person and resolve different data variations into a single entity. In many cases entity resolution can be solved using simple business rules. Based on data items like date of birth and phones numbers. But this is not always sufficient. Insurers are now using advanced analytical techniques such as probabilistic matching to determine the actual statistical likelihood that two entities are the same.
However when it comes to data quality, there is one small anomaly. That is when the data is being used for fraud analytics. Insurance companies should be careful not to over-cleanse their data. In some cases, an error such as a transposition in a phone number or ID number may be intentional; that’s how the fraudster generate variations of data.
To learn more download the white paper “Fraud Analytics: Data Challenges and Business Opportunities”.
Insurance companies should not under estimate the amount of work required for data management. It’s not uncommon for over 50 percent of the implementation effort dedicated to data integration and data quality. However they should not over think the problem. While poor data will deliver poor results, perfect data is an unrealistic expectation. In most cases good data is good enough!
I’m Stuart Rose, Global Insurance Marketing Director at SAS. For further discussions, connect with me on LinkedIn and Twitter.
1 Comment
Pingback: 10 simple steps to detect more insurance fraud - The Analytic Insurer