10 Cities in California That Might Experience Wildfire This Season

By | Uncategorized

A blog post by Anil Celik


California has been experiencing its worst years with wildfires. Of the 10 largest wildfires, 9 occurred after 2000, and 5 occurred after 2010. The Camp Fire (2018) was the deadliest (86 civilians) and most destructive (~20,000 properties) wildfire in California history. This single fire has been estimated to cost around $16.5 billion dollars in damages. The average loss ratio for California Homeowners market was above 130% in 2018, and this single event forced insurance companies to increase their premiums. We already see multiple insurance companies attempting to exit (or lower their market share) California Homeowners market and non-renews became a big problem for consumers who are looking for a coverage. 

Local governments, insurance companies, and technology vendors should do a better job understanding wildfires and we believe there is a demand for better data and models in the market. This is why we started hearing a lot of questions about wildfire modeling whether it’s possible to utilize AI to build better predicting models for wildfire risk, long before Camp Fire happened. The first thing we did was looking at existing alternatives in the market. One of the common problems we have seen in different models was that they assumed the relationship between wildfire events and the factors that could explain those events were linear. Another common problem we have seen was models relied on only a few variables e.g. soil type, slope, aspect, and access to roads. According to U.S. Department of Interior, humans are the cause of around 90% of wildfires. Clearly slope, aspect, or access to roads cannot be causes of wildfires, they are merely factors that affects the severity of fires.

Our model used over 25 variables that can start or accelerate wildfires, however these three are perhaps the most interesting ones: (1) Glass bottles that shatter over time and carried away by winds work as magnifiers to starts fires, therefore, we had to input this information to our model. Drivers stop by the road side tend to leave their trash there, this is why we used distance to intercity roads as a factor to the model. (2) Forgotten campfires in campgrounds are another important factor that we had to include to our model. (3) Poorly maintained power infrastructure started the Camp Fire. Distance to high voltage power lines are also another important factor that we have included in our model.

UrbanStat’s Wildfire Model has produced superior results compared to U.S. Forestry’s map. When you overlay the last 20 years of wildfires in California on U.S. Forestry’s map only 54.8% of the areas that burned is classified as `High` or `Very High` risk. The same analysis with UrbanStat’s map produces better results and this performance metric increases to 80.5%.

We strongly believe in our model’s ability to successfully predict the areas that have the highest wildfire risks. This is why we publish this analysis publicly. According to UrbanStat’s AI based wildfire risk map, here are the cities most at risk due to wildfires in California:

  • Ramona CCD, San Diego, CA
  • Moorpark CCD, Ventura, CA
  • Alpine CCD, San Diego, CA
  • Jamul CCD, San Diego, CA
  • Laguna-Pine Valley CCD, San Diego, CA
  • Fillmore CCD, Ventura, CA
  • Ojai-Mira Monte CCD, Ventura, CA
  • Simi Valley CDD, Ventura, CA
  • Santa Paula CCD, Ventura, CA
  • Palomar-Julian CCD, San Diego, CA


According to our model, 68.5% of the areas that have similar characteristics to cities mentioned above has already experienced a wildfire. According to U.S. Forestry and UrbanStat risk maps 35% of California is considered to have `High` or `Very High` risk. These cities only represent 2.5%.

Are you interested in hearing more about UrbanStat’s Wildfire Map? Contact me at [email protected] 

Press Release: Kevin M. Doyle Joins UrbanStat as an Advisor to the Board

By | Uncategorized

Chicago, IL – (September 27, 2018) – UrbanStat, a provider of an AI-based property underwriting platform, is excited to welcome Kevin M. Doyle to the team as an advisor to the board.

`Artificial Intelligence and machine learning has created the potential for massive transformational change within the insurance industry. UrbanStat is one of the most exciting companies I’ve come across in Insurtech. With its unique ability to offer a single solution that integrates automated underwriting powered by machine learning, data visualization, and strategic decision making, UrbanStat is well placed to revolutionize how carriers approach underwriting` says Kevin M. Doyle.

`Kevin has extensive experience within the IT and insurance industries, having worked for companies such as Marsh ClearSight, SAP, and CCC Information Services in different leadership positions. He has helped top tier carriers transform their analytical processes for over 20 years. As a company, we are very excited to start working with him. I am sure that with his help, UrbanStat will continue to innovate and reach to a broader audience in the North American insurance industry. ` says Anil Celik, CEO of UrbanStat.

With this latest addition, UrbanStat’s Advisory Board now has 3 distinguished members: Kevin M. Doyle, Nauman Noor, and Cem S. Celen.


About UrbanStat

UrbanStat has helped insurance companies such as Sompo Japan, Allianz, Ageas, Safety Insurance, and Gulf Insurance Group to automate and improve their property underwriting processes by utilizing geospatial data sets, statistics, and machine learning models since 2014.


About Kevin M. Doyle

Kevin M. Doyle has over 20 years of experience in the insurance industry serving top tier carriers in their data analytics and digitalization needs. He was previously the SVP, Client Management and Delivery Leader at Marsh ClearSight, and also worked as a Sales and Business Development Manager at SAP and CCC Information Services.

Is Machine Learning the Silver Bullet In Underwriting?

By | Uncategorized

A blog post by Nilgun Celik, Tom Gubash & Anil Celik

Using machine-learning to underwrite property insurance has been our key focus for the last 18 months. It started as a simple Minimum Viable Product (MVP), we had our highs and lows, we made many mistakes, and every time we see a new data set we are surprised how much we are still learning.

For engineers, machine learning is a simple concept albeit the complicated math behind it. For non-engineers, it is an abstract concept where you input your data, and it generates magical results. Ergo, we get the question `… but how? ` very often.

Machine learning is a powerful tool that helps a variety of different industries in so many great ways. It always requires lengthy preparations on data; certain problems need more research or understanding of an entire industry and its regulations. Insurance is unquestionably one of them. This blog post will focus on some of the obstacles we experience.

Predicting the policyholders who will file a claim sounds like a true supervised learning problem at first. However, when you start thinking about how the industry works, you start seeing that there are no real `True Positives` or `True Negatives`. Supervised learning problems require a historical data set with known outcomes. Let’s say you are trying to identify fraudulent cases; the algorithms require a historical dataset where you mark the claims with actual fraudulent claims. When It comes to claim prediction, there is the problem of `lack of claims`. Since insurance is a long-term game, a customer who didn’t claim anything for five years could file a large claim in year 6. If you score this customer `High` in year 4, is your algorithm successful or not? When you measure your performance in year 4, your algorithm fails, however, when you run a long-term performance measurement, the scoreboard will reflect an entirely different story.

There are also a few technical obstacles insurance carriers need to overcome. One of the most significant problems is the imbalance between the customers who claim and don’t claim. Most insurance companies have around 1-6% of claim frequency. It means that out of every 100 policyholders, only 1-6 policyholders will file a claim. Our algorithms try to identify those 1-6 policyholders so the insurance carriers can come up with fairer terms and pricing for their entire portfolio. It means that only 1-6% of the data tells us the story we want to learn. This is one of the very first things insurance carriers need to solve, and the good news is that there are a few solutions. If you are interested in reading more about this issue, we discussed this topic in great detail here.

Insurance is a highly regulated industry. The way that insurance carriers can use these technologies could be constrained by regulations depending on the state/country you are operating in. The industry should not pass human biases to the algorithms; we need to be very careful about it. This is why we never use any of the following information during our modeling process:

  • We don’t use personal information: Name, gender, age, ethnicity or anything that would directly correlate with these features
  • We don’t use any financial data: credit scores, insurance scores, income or anything that would directly correlate with these features

We always tell our clients that we don’t want any personal information that would be present in the policy or claim files. The only personal information we use is an address, and we only use that to understand location-based risks, not to come up with socio-economic segmentation. This makes things interesting because we don’t know anything about the customer we are trying to come up with a risk score. Often the actual underwriters have more information about the very same customer as the algorithms don’t have the personal experience/knowledge of the underwriter. This is why algorithms don’t have an ongoing bias. We are not saying they are entirely free of biases, but this is a topic for another article.

Regardless of the barriers, this is a fascinating problem to work on. We genuinely believe that the research we are doing, the experience we are gaining help us create the right approach, and we know that it already helps some of our clients in improving their profitability. We know that machine learning will change how the industry is underwriting. We don’t necessarily believe that underwriting will be replaced entirely by machines (although some people think otherwise); thus, we don’t think it’s the silver bullet. The actual silver bullet is the combination of machine-learning, traditional probabilistic modeling, and more importantly, human intuition. We call this combination ‘the Three Pillars of Risk Analysis`.


To learn more and quickly leverage what we’ve already successfully deployed for our carrier partners contact us at [email protected].