[Demo Webinar] Ready to break up with ZooKeeper? Meet KRaft! | Register Now
In the context of event-driven business applications relying on Kafka, ensuring a high level of reliability is paramount: messages must not be lost, and they have to be processed in the correct order no matter what errors were encountered in the process.
This presentation starts with spotlighting the typical pitfalls and challenges encountered when trying to implement a reliable event-driven architecture. Starting with consumers, we address issues such as deserialization exceptions and effective offset management. Moving on to event processors, we differentiate between transient and non-transient errors and discuss how to deal with both cases, with a specific focus on Kafka Streams. Exploring the producer side, we delve into the main settings requiring tuning for enhanced resilience.
Having identified the key points to consider, we present the different patterns used both to ensure transactionality and to cope with errors (e.g. Saga, Retry topics, Dead Letter Queues).
In the second part of the presentation, we take a real-life project and explain the design choices we made to meet our customer’s reliability requirements. Through a live coding session, we illustrate selected design patterns and demonstrate how the difficulties encountered were mastered. This deep dive into code shows how error handling and distributed transactions can be implemented using Kafka, Spring Boot, Spring Cloud Stream and the Kafka Streams binder.
The presentation concludes by touching on monitoring application reliability and stability with tools like Micrometer, Prometheus, and Grafana.