Level Up Your Kafka Skills in Just 5 Days | Join Season of Streaming On-Demand
Partitioned state is a fundamental building block of Kafka Streams for processing data in a distributed, asynchronous and data-parallel fashion. With this, high performance and scalability can be achieved, given a proper configuration with respect to state’s sizes and rebalance settings. However, besides partitioned state there is often a need for consistent, up-to-date, and complete reference data that is available on each replica, i.e., global state.
In this talk, we explore and compare different approaches of maintaining global state for Kafka Streams applications. We first investigate the features Kafka Streams provides out-of-the-box: Repartitioning, GlobalKTables and Interactive Queries. We then focus on integrating third-party systems into Kafka Streams, such as distributed in-memory caches. We illustrate all approaches based on an example scenario, where we combine ecommerce data streams with globally available stock and delivery data.
Finally, we demonstrate our coordination-free approach to build a consistent global state over time, across all replicas, based on a continuous reference data stream. For this, we rephrase the problem and restructure the global state to eliminate the need for coordination. Then, Kafka Streams applications can scale again independently with their partitioned state.