A blog post by Nilgun Celik, Tom Gubash & Anil Celik

Using machine-learning to underwrite property insurance has been our key focus for the last 18 months. It started as a simple Minimum Viable Product (MVP), we had our highs and lows, we made many mistakes, and every time we see a new data set we are surprised how much we are still learning.

For engineers, machine learning is a simple concept albeit the complicated math behind it. For non-engineers, it is an abstract concept where you input your data, and it generates magical results. Ergo, we get the question `… but how? ` very often.

Machine learning is a powerful tool that helps a variety of different industries in so many great ways. It always requires lengthy preparations on data; certain problems need more research or understanding of an entire industry and its regulations. Insurance is unquestionably one of them. This blog post will focus on some of the obstacles we experience.

Predicting the policyholders who will file a claim sounds like a true supervised learning problem at first. However, when you start thinking about how the industry works, you start seeing that there are no real `True Positives` or `True Negatives`. Supervised learning problems require a historical data set with known outcomes. Let’s say you are trying to identify fraudulent cases; the algorithms require a historical dataset where you mark the claims with actual fraudulent claims. When It comes to claim prediction, there is the problem of `lack of claims`. Since insurance is a long-term game, a customer who didn’t claim anything for five years could file a large claim in year 6. If you score this customer `High` in year 4, is your algorithm successful or not? When you measure your performance in year 4, your algorithm fails, however, when you run a long-term performance measurement, the scoreboard will reflect an entirely different story.

There are also a few technical obstacles insurance carriers need to overcome. One of the most significant problems is the imbalance between the customers who claim and don’t claim. Most insurance companies have around 1-6% of claim frequency. It means that out of every 100 policyholders, only 1-6 policyholders will file a claim. Our algorithms try to identify those 1-6 policyholders so the insurance carriers can come up with fairer terms and pricing for their entire portfolio. It means that only 1-6% of the data tells us the story we want to learn. This is one of the very first things insurance carriers need to solve, and the good news is that there are a few solutions. If you are interested in reading more about this issue, we discussed this topic in great detail here.

Insurance is a highly regulated industry. The way that insurance carriers can use these technologies could be constrained by regulations depending on the state/country you are operating in. The industry should not pass human biases to the algorithms; we need to be very careful about it. This is why we never use any of the following information during our modeling process:

  • We don’t use personal information: Name, gender, age, ethnicity or anything that would directly correlate with these features
  • We don’t use any financial data: credit scores, insurance scores, income or anything that would directly correlate with these features

We always tell our clients that we don’t want any personal information that would be present in the policy or claim files. The only personal information we use is an address, and we only use that to understand location-based risks, not to come up with socio-economic segmentation. This makes things interesting because we don’t know anything about the customer we are trying to come up with a risk score. Often the actual underwriters have more information about the very same customer as the algorithms don’t have the personal experience/knowledge of the underwriter. This is why algorithms don’t have an ongoing bias. We are not saying they are entirely free of biases, but this is a topic for another article.

Regardless of the barriers, this is a fascinating problem to work on. We genuinely believe that the research we are doing, the experience we are gaining help us create the right approach, and we know that it already helps some of our clients in improving their profitability. We know that machine learning will change how the industry is underwriting. We don’t necessarily believe that underwriting will be replaced entirely by machines (although some people think otherwise); thus, we don’t think it’s the silver bullet. The actual silver bullet is the combination of machine-learning, traditional probabilistic modeling, and more importantly, human intuition. We call this combination ‘the Three Pillars of Risk Analysis`.


To learn more and quickly leverage what we’ve already successfully deployed for our carrier partners contact us at [email protected].