Data decay: One reason insurers should value synthetic data

Learning from experiences (or “data”) in the world around us is as hard-wired as breathing. But this beautiful endeavor that perfectly reflects the human condition is no longer exclusively a human experience.

To be direct: Machines learn like humans learn. Let’s consider how.

Neural networks are computing systems with interconnected nodes that work like neurons in the human brain. Through algorithms, they can recognize hidden patterns and correlations in raw data, cluster and classify it, and – over time – continuously learn and improve.

An early form of artificial intelligence, neural networks are fueled by data. And data represents experience. The faster the world changes, the quicker the data from which we (or machines) learn becomes unreliable.

Data from past experience: Can we still trust it?

Think about what was true just five years ago: no COVID, no Ukraine War, no ChatGPT (or hype around generative AI (GenAI), no inflation, no supply chain disruption or toilet paper wars.

Considering the current pace of change, how reliable is the historical data we use to determine rates, make underwriting decisions or settle claims? How long is that data viable before it can no longer be trusted? Do our loss experiences, our policy acceptance (or declination) decisions, or our sales and marketing tactics accurately reflect evolving risk?

In 2019, the answer might have been yes. But with every passing day, it feels like our data is some double agent working against us.

We shouldn’t allow ourselves to be handcuffed to old truths. Instead, we should explore the possibilities of infusing synthetic data, a form of generative AI, into our processes.

Learn how insurers can use synthetic data to fight bias and protect privacy

Synth and (T)win

Why use data that’s not straight from the real world? Well, lots of reasons: sensitive or private information, cost, bias, availability, rare scenarios… the list goes on.

For insurers, there are several widely accepted and reliable techniques to generate synthetic data.

Generative adversarial networks (GANs) were first introduced by Ian Goodfellow and his colleagues in their paper "Generative Adversarial Nets" in 2014. For a technical deep dive, feel free to explore this discussion by Jason Colon. The crude explanation is that a generator makes data and tries to “fool” a discriminator – this can be image, text, audio, video or tabular data. The results demonstrate upwards of 99% accuracy when compared to real data as these two networks compete against one another (hence the name, “adversarial”).
Synthetic minority oversampling technique (SMOTE) addresses class imbalances by supplementing minority data sets to improve the statistical significance of the entire data set. In one technical paper, SMOTE was determined to be a highly reliable data science technique in determining insurance premium nonpayment cancellations.
Digital twin technology generates a virtual model of a physical object or system from the real world. For example, a manufacturer might build a digital twin of a large piece of equipment to understand potential loss scenarios. This could prevent catastrophic failure due to vibrations or centrifugal forces and could project when components need to be replaced or maintained. Digital twins can use a combination of historical, real-world data, synthetic data and system feedback loop data as inputs. These inputs can be processed in batch or in real time.

Insurers can use any of these synthetic data generation techniques when faced with rare events, incomplete data or hard-to-obtain data. In addition to the above examples, insurance companies can use synthetic data to fight bias, avoid violation of privacy regulations and prevent exposure of sensitive information.

A haze of clarity

Insurers’ investment in synthetic data generation will address data decay and add value. Pioneering organizations like Hazy have proven the value of synthetic data.

Gartner says that by 2026, 75% of businesses will use generative AI to create synthetic customer data, up from less than 5% in 2023. IDC specifically notes that by 2027, “40% of AI algorithms utilized by insurers throughout the policyholder value chain will utilize synthetic data to guarantee fairness within the system and comply with regulations.” The report further predicts this integration will expand to underwriting, marketing and claims.

Data and AI research from SAS confirms the predictions: “50% of insurers expect up to two times, and 41% over three to four times, return on AI investments.” It’s also noted that GenAI will improve claims processes and operational efficiencies.

These results come with trustworthy-by-design assurances when considering data privacy and protection laws like the General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA) or the EU AI Act.

It’s so easy…

How easy is it? Point-and-click. No coding.

It’s true. The returning champion team for the 2024 SAS Hackathon, the StaSASticians, demonstrated the ease of use and functionality built into today’s data and AI tools.

Their hack story focuses on worker safety and the SMOTE technique. Data gathered from “smart helmets” was fed into a dashboard, with the intention of monitoring for early warning signs of heat stroke. However, the collected information was imbalanced (it didn’t provide a sufficient amount of diverse data), so the team used the SMOTE technique to address the imbalance.

The result? A worker safety model applicable to workers' compensation insurance that can inform “predict and prevent” outcomes.

Impressively, the team built the solution in a few weeks – with minimal data. This is the equivalent of Tony Stark building the original Iron Man suit in a cave. Imagine what a large enterprise could do with such powerful technology. (Did you know part of Iron Man 3 was filmed at SAS headquarters – crazy, right?).

So, which is better – real-world or synthetic data?

The answer to that question sounds like the start of a bad joke, but it’s one that came from personal experience.

Imagine this: You sit down to breakfast with the head of AI and the chief actuary at a large insurer. You start discussing synthetic data. The head of AI says, “We don’t like synthetic data. We like real data.” The chief actuary says, “If we don’t have real data, synthetic data works well.” The head of AI says, “It’s not as good as real data, that’s why we don’t like it.” The chief actuary responds, “Well, having something is better than having nothing.”

And around and around they went until the check arrived.

Both sides are correct. If you have sufficient amounts and types of real-world data that you can access, use and trust, that’s great. But this will not always be the case.

The bottom line: Challenge the status quo

To paraphrase some brilliant insight from Tommy Lee Jones (Men in Black, 1997), knowledge and certainty can be stupid and dangerous. Whether we’re discussing things like "The earth is flat," “The 4-minute mile can’t be broken,” or “We only like real data” – someone pushed back on those notions.

Insurers like MAPFRE already refer to synthetic data as a “strategic advantage.” Many recognize the value in unlocking their "treasure trove of data" to settle claims, fight fraud and develop new products.

Both endeavors can be accomplished – we can still use both real-world and synthetic data. Just remember that as data decays, we should prioritize the most recent and most reliable experience and combine it with the power of generative AI.

Blogs

Blogs

Data decay: One reason insurers should value synthetic data

Learning from experiences (or “data”) in the world around us is as hard-wired as breathing. But this beautiful endeavor that perfectly reflects the human condition is no longer exclusively a human experience.

Data from past experience: Can we still trust it?

Synth and (T)win

For insurers, there are several widely accepted and reliable techniques to generate synthetic data.

A haze of clarity

It’s so easy…

So, which is better – real-world or synthetic data?

The bottom line: Challenge the status quo

What awaits the insurance industry as they race to implement generative AI? Check out our global research study to learn more.

About Author

Leave A Reply Cancel Reply

Follow Us

What is...