Data Quality and Analytics in Artificial Intelligence for Insurance

For an industry dependent on accurate data to feed information into actuarial engines, the insurance sector has historically suffered from suboptimal data quality.

March 24, 2020

Artificial intelligence is poised to completely transform many sectors, not least the insurance industry. While insurers have been leaders in leveraging massive stores of data to price and underwrite as well as process claims and other important operations, the computational strength and insights that artificial intelligence techniques provide are now bringing the utility of data to a completely different level of precision and pioneering new data analysis not previously possible.

Nevertheless, the insights gained from insurance analytics are only as good as the underlying data itself. Here we explore how data quality impacts insurance analytics, and what carriers can do to avoid data pitfalls to ensure that the decisions their artificial intelligence systems are making are based on the best possible inputs.

Sourcing and Storing Data
For an industry dependent on accurate data to feed information into actuarial engines, the insurance sector has historically suffered from suboptimal data quality. Several factors have collided to undermine insurers’ data accuracy:

  • Multiple formats of unstructured data are difficult to categorize and organize
  • Legacy data storage systems are not wholly compatible with newer software programs, leading to data dissonance and gaps in the resulting analyses
  • Thousands of siloed, discrete spreadsheets become unwieldy to manage at scale
  • Increasing quantities of multiple sources of data, such as financial, behavioral or data gathered from vehicular IoT devices, for example, create a new and complex patchwork of data types difficult to categorize and process

The multiple types of inputs required to create actuarial tables challenge the majority of current data systems. The foundational problems with these datasets in turn lead to skewed results from artificial intelligence systems as they attempt to uncover hidden patterns and extract insights from the vast stores of data that carriers possess.

Regulators On Watch
With the increasing prevalence of data leaks, IoT hacks and other losses of information, regulators have become increasingly vigilant about the quality, maintenance and storage of data in the insurance industry. This is especially true now that the results of the insights that artificial intelligence applications produce are so much faster, more granular and more revelatory.

The difficulty here is that most regulators require that data be “accurate, complete and appropriate” without necessarily giving a more thorough definition, leaving carriers to define for themselves how best to interpret the requirements. This disequilibrium leads to inconsistency of execution across the industry, feeding a negative feedback cycle that fuels further uncertainty among regulators, the public and carriers alike.

Show Me the Data
Even attempting to visualize and otherwise represent data can further compound challenges for accurate analysis. Organizations must choose what types of data to include and analyze in dashboards, for example. This self-selection can be problematic if poor-quality data sets are included in broader metrics, skewing the results of often mission-critical KPIs.

Because dashboards are necessarily exclusive by nature, it also can be easy to miss incidents of poor data management, simply because the dashboard does not include that data. Fortunately, there are resources such as DRC Analytics that minimize the risks associated with customizing dashboards and panels.

Data Quality and Competitive Advantage
Improving data quality starts at the top. Executives must make high-quality, reliable data a strategic priority. The trickle-down effect of leadership embracing high data quality must extend throughout the organization into all departments from HR to claims to customer service. Data quality is only as strong as its weakest link. The whole organization needs to be vigilant about data quality.

In addition to making data quality a top priority they also must be clear about what high-quality data means. Setting benchmarks and creating appropriate incentives for key stakeholders and departments to reach defined data quality goals clears confusion and dispels misunderstandings about expectations. But it’s not set and forget. Data scientists must schedule periodic reviews to revisit the parameters of the definition of high-quality and adjust as necessary. Additionally, taking advantage of rigorous data analytics platforms helps identify data issues in their nascent stages before they spiral into significant problems.

Carriers should also openly recognize that ensuring high data quality is an ongoing challenge, and build it into their DNA. Scope creep or overwhelmed data management teams can quickly take shortcuts to manage unwieldy data, actions which can, unfortunately, turn into patterns that undermine progress. But with appropriate resources and talent, continual investment in data quality bears ongoing annuities and helps advance lasting competitive advantage over other organizations for whom “ok” data is good enough.