Big data, IoT and data warehouse?


It's the age of big data and the internet of things (IoT), but how will that change things for insurance companies? Do insurers still need to consider classic data warehouse concepts based on a relational data model? Or will all relevant data be stored in big data structures and thus render classic data warehouses superfluous? Many insurance companies are asking these questions. To find an answer, we have to consider some relevant IoT and big data analytics approaches for the insurance industry.

In the last few years, the IoT has changed business models in essential ways, especially for manufacturing. These changes are currently affecting the insurance industry, with new business models and rate-making methods driven by IoT. Right now, the main drivers are telematics and real-time scoring approaches. For example:Internet of Things

  • Telematics in auto insurance: Use of telemetry data for optimized tariffs and incentives for claim reduction, calculation of optimal insurance premiums for the individual risk of damage, for example: 'Pay-how-you-drive' tariffs.
  • Telematics in health insurance: Use of health and life-style data (as of fitness bracelets) for the calculation of new healthcare plans.
  • Real-time scoring for credit and plausibility check upon application: Analytical evaluation of existing customer data, external statistics, Internet data and information from geo-systems in real-time.
  • Real-time scoring for fraud in damage report: Analytical evaluation of party and contract data, historical data, internet data and geosystems.

It's not always useful to store the resulting raw data along with the structured party and contract data in a relational database. Hadoop is more appropriate, and also more cost-effective, for that. On the other hand, these big data systems are not (yet) apt to process complex relational data structures with good performance. Their strength lies in the extremely fast in-memory processing of relatively flat data structures that occur in high volume (for example, as weakly structured streaming data). Therefore, it's likely that big data structures such as Hadoop, and relational data stores such as Oracle, are operated in parallel by the IT of insurance companies for the next time.

Big Data ArchitectureHowever, it will be necessary to implement a reasonable connection between the two data systems. On one hand, this can be a combination of data from the data warehouse at the level of analytical data marts (e.g. by linking the streaming data to a customer or contract number from the data warehouse). But this assumes that the insurance company's IT operates the analytical processing of the IoT data itself. Often, a scenario will be used, in which the big data analytics results are provided by an external service provider to the insurance company. In any case, the task will be to link the analytical results (e.g. scoring values) to the information stored in the data warehouse.

Example: Telemetry data in automobile insurance

Some insurance companies have already developed new automotive tariffs - in a nutshell, rewarding defensive driving behavior by favorable premiums. This is done by using the IoT: Analyzing data from a telematics box which is installed in the car and forwards anonymous information on the driving behavior.

External companies (auto industry, telecom companies) are going to derive their own analytical scoring values on drivability and may offer them as an external service. Regardless of whether the scoring values are determined externally or internally by an insurance company, they must be brought into relation with the contract information stored in the data warehouse.

Big Data Data ModelIn a business analytics standard data warehouse for insurance, such as the Detail Datastore (DDS) of SAS, analytical models and analytical scoring values are already implemented in the data model by default. This can be easily used to model a scoring-based tariff system of automobile insurance (and also for any other line of business). The analytical results can be assigned to any level of the insurance business process, for example, to customer, contract or exposure level.

This shows that analytical results of big data analytics models may be integrated very well into concepts of a standard data warehouse for business analytics and enrich the structured customer and policy data of the present insurance data warehouse.

Future big data concepts may open up the possibility to merge both models in one data lake. Today's big data systems, however, have (different) difficulties to deal with complex relational structures. Therefore, a scenario as described above - and thus the concept of a standard data model for insurance - appears to remain valid indefinitely.

Read more on Internet of Things and Big Data Analytics here or visit the Big Data Analytics Forum in November 2016 in Frankfurt/Main.

Hartmut Schroth, Business Advisor data strategies for insurance at SAS Germany. For further discussions, connect with me on LinkedIn.


About Author

Hartmut Schroth

Business Advisor

After finishing his master study of mathematics in 1981 Hartmut has been working in insurance and IT business in several positions. Since 2006 he is employee of SAS Institute in Germany. As business advisor in the presales team he is responsible for DACH region (Germany, Austria and Switzerland). His main focus is advising customers of financial services industry in strategies for data models of SAS Business Analytics solutions. Moreover he is regional product manager for the SAS Insurance Analytics Architecture.


  1. Philip Hummel on

    Please take a look at the Polybase integration with Microsoft SQL Server 2016 to see an example of how to see how big data structures like Hadoop and traditional relational data stores can be used effectively together. Thanks for raising awareness of this issue.


  2. Hartmut Schroth
    Hartmut Schroth on

    Thanks for this hint. Technically there seem to be no issues with integration of big data structures e.g. Hadoop and relational databases like MS SQL Server, Oracle or DB2. But the demand of many customers is to replace relational databases by the (more cheaper) Hadoop infrastructure. And this scenario is not fully working today due to performance leaks (especially in context with historized records).


Leave A Reply

Back to Top