[Webinar] Kafka + Disaster Recovery: Are You Ready? | Register Now
The insurance industry has undergone a massive transformation over the last 20-30 years. Customer service and business processes that were once done on paper or over the phone are now completed via web and mobile experiences. As a result, manual paperwork and tasks have gradually become automated via software solutions. For the insurance customer, this transition has led to faster and more predictable experiences.
McKinsey has previously predicted that more than half of insurance claims will be processed automatically by 2030, and recent AI advancements are already accelerating this trend. However, many of these automations still rely on batch jobs, causing data delays that may occur hourly, overnight or weekly and prevent the real-time experience customers expect in today’s connected economy.
Providers have felt the pressure to keep pace with consumer expectations to process policy quotes, plan applications, and adjudicate claims faster and more accurately than ever. Advances in artificial intelligence are only accelerating this trend. Today’s insurers need real-time data processing not only to keep pace with the competition but also to set themselves up to better evaluate risk and optimize pricing by training AI models with the wealth and variety of data now available to them.
According to the 2024 Data Streaming Report, across industries, 51% of IT leaders cite DSPs as enabling their organizations to be nimble. And in financial services, that has translated to huge returns: 79% of IT leaders in the industry have realized 2-5x the return on their data streaming investments. And another 8% have seen a 10-fold return on investment or more.
We’ve previously explored how Confluent’s data streaming platform has helped insurers provide real-time quotes, so now let’s consider another popular use case we are seeing from our customers in the industry—implementing real-time claims processing.
Whether we’re talking about health, homeowner, or umbrella insurance, claims processing represents an entire category of complex procedures, standards, and best practices. For the purpose of this post, we’ll take a look at what real-time auto claims processing would look like under the hood—an example most should find familiar and simple to follow, broken down into 5 steps:
Step 1: Stream new claims in FNOL topic to kick off your event-driven claims engine
Step 3. Use Flink for data filtering and routing for initial claim validation
Step 4. Route claim for straight-through processing or human assessment
Interested in learning more about other data streaming use cases for financial services? Check out the FinServ panel on September 18, Day 1 at Current 2024.
At a high level, claims processing looks the same across insurers and event types of insurance.
First, a policyholder (or their broker) has to inform the insurance provider of damages and file a claim. This event—also known as first notice of loss (FNOL)—initiates a process that is relatively consistent across providers and even different types of insurance policies. Once submitted, the claim needs to go through validation, investigation, and assessment stage. Then, if approved, the claim still needs to go through settlement and payment processing for the policyholder to see any reimbursement or payment for covered services.
During validation, the insurer needs to make sure that the claim is in-policy, not fraudulent, and has enough information to continue processing. Depending on the type of policy and claim, users may need to provide additional information (e.g., documents, pictures, police reports, medical bills).
Historically, this series of steps would have been accomplished over days or weeks and required a combination of database updates, batch jobs on timers, and synchronous task completion as well as human intervention. Aside from the undeniable threat to customer satisfaction, this lengthy claims process adds the risk of inaccurate assessments and erroneous settlements due to stale data. With Confluent’s data streaming platform, we can reduce these risk factors by streaming these data events to the stream processing engine, as near to real-time as possible.
In addition to the high-level steps we’ve outlined above, there are some actions that need to be taken throughout the overall process—either continuously or triggered by specific events—including:
Sending out customer notifications
Ongoing fraud detection as more information is gathered
Updates to expected loss—the estimated monetary value of a claim
Training machine learning and analytics models
Archiving or logging for regulatory compliance purposes
From the FNOL onwards, each open claim represents potential future costs to the business. That’s why it’s so important to continuously feed fraud detection systems with new information on a claim and keep the expected loss associated with that claim up to date. Having these running parallel to the actual claims processing allows the insurer to have a very accurate estimate of future loss, whether they have hundreds or thousands of open claims at once.
With legacy systems, completing these steps throughout the claims flow is very complex and requires a web of polls, synchronous calls, and batch jobs. But Confluent’s end-to-end data streaming platform encompasses serverless Kafka, source and sink connectors, stream processing, governance, and more—meaning a claim can flow naturally from initiation all the way to resolution in as short a time as possible. We can model the software to fit the business process, rather than shoehorning the business process into the available software tools. The resulting flow will look something like this:
Claims and any data needed to process claims will flow into the system on the left hand side, and will move through each step of processing until the claim is resolved and archived on the right hand side. Completion of each step will automatically trigger an event that kicks off the next step.
Flink is a stateful stream processing engine that enables complex stateful operations on an unbounded stream of data or events. This means that Flink keeps local state necessary to perform any arbitrary operation—everything from aggregations, filtering, and routing to joining two or more streams, enriching streams with table data, or data transformations.
With Apache Flink on Confluent Cloud, our insurance customers have Flink SQL available to them and their Kafka workloads. With Kafka acting as the data layer while Flink acts as the compute layer, Kafka topics serve as the inputs and outputs to a Flink job.
In the case of our auto claims processing use case, because each Kafka topic can have numerous Kafka consumers, as claims events are generated as part of the main flow, a downstream notification or fraud detection service can read and trigger a response to claims events.
Now, let’s explore how Confluent approaches building a claims engine prototype using Flink on Confluent Cloud. Because of the complexity of claims processing—further complicated by the specific of how each organization handles claims—we won’t be detailing every exact step for building this kind of streaming data pipeline but rather outlining basic principles that you can adapt to your own specific use case.
As you can see in the diagram below, claims can be submitted via many different sources. Policyholders can submit them directly via the website or mobile app, or they can come from insurance brokers or call center employees using an admin dashboard. Our initial claim event will have some information about the loss (date, location, amount of damages) as well as an Account ID that links to the policyholder.
Confluent provides over 120 pre-built, pluggable, and easily configurable fully managed connectors to stream these data sources directly Into Confluent Cloud. All Kafka topics are automatically discoverable as Flink tables, so we are ready to use Flink to apply Stream Processing to our Claims flow.
One of the important principles of stream processing is: get data necessary for processing as close to the processor as possible. That way, instead of having to make many costly and time consuming calls to databases or APIs, we can access relevant data quickly from memory.
In our case, submitted claims have an Account ID, but no other information about the policyholder. In order to process the claim, we will need more information such as the policyholder’s name, address, date of birth, and policy expiration date. We have this customer data in an Oracle database and are using Confluent’s Oracle CDC connector to stream it into a Kafka topic. Then, because we have Claim data in a topic and Customer data in a topic, we join those two topics together with Flink:
As you can see, the output of the join will be sent to a new “enriched-claims” topic with all the relevant data to continue processing. The resulting flow will look like this:
This type of join is called a “temporal join” in Flink because it accounts for event time when joining (hence the crucial line “FOR SYSTEM_TIME AS OF claims.submitted_at”). This is important for use cases like ours. Because customer information is constantly changing, there are many ways in which static and batch data systems can end up with wrong data being joined to a claim (such as when a user’s policy details gets updated shortly after a claim is submitted, but before the claim is picked up by a batch job). But with temporal joins, Flink guarantees that customer data that is joined is accurate for the exact moment when the claim was submitted.
If, on the other hand, this is not the desired behavior, other types of joins are available. For example, if we wanted to join the customer’s latest address to send them mail, and we don’t care what their address was at claim time, this can be accomplished in Flink as well.
Now that we have an enriched-claims topic, we can perform basic validations. This is a good example of filtering and routing data streams with Flink based on certain conditions.
In this example, we will route the single enriched-claims topic into three outgoing topics based on the following checks:
Check to make sure the user’s policy is not expired. If the policy is expired, route the claim to a “rejected-claims” topic.
Confirm that the geographical state on the policy matches the state in which the accident occurred. This will serve as a rudimentary fraud check. If this fails, we will send the claim to a “potentially-fraudulent-claims” topic.
If the policy is not expired, and there are no geographical concerns, then the claim will be routed to a “validated-claims” topic for further processing.
The rejected-claims topic can be consumed by a notification service, triggering an email to the customer about the rejection, as well as perhaps a sink connector archiving the rejected claim in a connected database.
The possibly-fraudulent-claims topic can be sent to a system for further analysis—either automated or manual. The result of that analysis will determine if the claim goes to rejected-claims or validated-claims. (Policyholders are usually covered when driving out of state, but in this imagined scenario we’ve noticed that out-of-state claims are a statistical indicator of fraud).
Below is some sample code for this type of validation. We can group the three necessary SQL statements into one “Statement Set” to be executed together.
The resulting flow looks like this:
Remember, these validations are just for example purposes. In reality, the validations at this stage can be as advanced and sophisticated as needed.
Also note that the routing with Flink SQL does take three separate statements grouped into a statement set, but when Confluent adds support for the Flink Table API later this year, these can be combined into one single chained statement in Java or Python.
In some cases, an entire claim can be assessed and settled automatically—this is known as straight-through processing (STP).
For example, let’s say a policyholder submitted a claim for a car with an established Kelley Blue Book value under $3,000 with an attached police report, and all validation checks and fraud checks have passed. In this case, the claim could go through a fully automated assessment stage without any human intervention.
A simple Flink processor job can check validated claims to see if they meet these conditions. If they do, then those claims would be passed directly to the “assessed_claims” topic, to be ready for settlement and payment.
As artificial intelligence continues to advance, more claims assessments can be fully automated, enabling more STP. For example, some organizations already use machine learning to evaluate car accident pictures and estimate the cost of the damage. Because an event-driven system is inherently decoupled and pluggable, as new capabilities come online they can be integrated into the claims flow without needing to refactor the whole claims engine.
In more complicated cases, human intervention may be unavoidable. For example, for an auto accident claim, insurers need to determine whether or not a car is totaled. Or there could be complicated accidents with multiple parties involved, in which case adjusters from multiple companies need to mediate who is at fault. In these types of scenarios, our system can still be fully event-driven, but some events will require routing to an admin dashboard for a human adjustor to interact with—this is known as a human-in-the-loop (HITL) flow. Once the adjuster has completed and submitted their assessment some number of hours, days, or weeks later, an event will finally be sent to the same “assessed_claims” topic that STP claims went to.
Because Kafka topics can have multiple independent consumers, new automated capabilities can be implemented alongside existing HITL flows, both to compare the model output to the human output to determine viability, and also to train the model in real time on the human output.
Whether the claim was eligible for straight-through processing or needed a human-in-the-loop, eventually we will have a fully assessed claim with liability and dollar amount established. What happens at this stage will be highly dependent on the specific organization and type of claim, but in our prototypical scenario we are having our final output topic read by the following microservices:
Payment processor, which executes payment. This could be a custom built solution or it could be a Kafka consumer that routes the claim to a 3rd party tool.**
Notification service or emailer to notify the customer on the status of their claim or payment.
Loss calculator service, which makes the final business update on loss.
Sink connector to MongoDB for archiving the claim.
Logically, this final split looks like this:
**Calculating a payment will usually involve reading and updating the policyholder’s deductible. It is often said that deductible management (including getting and releasing locks on a deductible) is the hardest part of building claim systems. As such, this matter will not be covered in this blog post, other than to say that the above mentioned join techniques could be used effectively for joining deductibles.
Now that we have outlined a simple claim processor with Kafka and Flink, we can review the important business outcomes that this system delivers:
The claim is processed on-demand instead of waiting on batch jobs. STP is utilized wherever possible, and even when a human is needed, the claim can be routed to and from the human in real time. This leads to greater customer satisfaction and a more efficient utilization of labor.
At any given step of the flow, the business can have an estimate of loss that is as accurate as possible. This helps estimate future profit and losses. Also, when claims volumes are unusually high or low, underwriting systems can adjust quickly as necessary to optimize profit and minimize exposure.
Fraud detection can be done in real time at any and all steps of the claims flow. We highlighted only one simple example of fraud detection, but in reality there could be many fraud markers at different parts of the flow that could be evaluated without significantly slowing processing down.
The system is fully pluggable—new features and flows can be added easily without affecting the rest of the system. This leads to shorter development time and faster innovation.
Ultimately this is a very low-code solution—everything is implemented with kafka topics and a handful of lines of SQL. This makes development cycles much faster, increases workforce utilization, and decreases risk.
Claims processing is just one of many insurance use cases that stream processing can accelerate—stay tuned as we explore further examples in an ongoing series.
At each of the stages we’ve outlined here, it’s clear that the volume and variety of data insurers intake and manage—and the speed at which they need to to act on this information—makes batch jobs insufficient for modern claims processing.
Instead of performing operations on a large batch of data at a specified time, you can perform operations on each event as soon as it happens. Setting up these kinds of event-driven pipelines unlocks numerous other real-time use cases for insurance companies. The more processes that insurers build streaming data pipelines for, the broader reaching the benefits can be for their organization as Kafka streams can be consumed for limitless downstream applications and use cases.
Check out the GitHub repository for this use case to explore the code for the claims processing architecture we’ve outlined here. You can also discover more stream processing architectures in our use case library or learn more about data streaming in financial services.
Discover how Confluent has transformed data management for Kmart and IAG in Australia and New Zealand with its real-time data streaming platform.
Reworkd CTO Adam Watkins shares how the AI startup leverages agentic AI, GenAI, and data streaming to automate and scale real-time web scraping for faster, more reliable data extraction.