Level Up Your Kafka Skills in Just 5 Days | Join Season of Streaming On-Demand
You might already use all known strategies to choose the right number of partitions for your newly created Apache Kafka topic. You apply the best recommendations to evenly distribute data across partitions in the topic. You even have metrics to observe and inform you on that. You do everything right.
But then reality happens. Despite best efforts, data is published unevenly, making it slow, expensive, and difficult to consume data from a topic. The future is full of unexpected impossible to predict events, and it doesn't care about rules or normal distributions.
This doesn't mean that we can simply disregard good practices. However, we need a plan for when things don't go according to anyone's calculations.
Come to this talk to learn what to do when the data distribution across topic partitions is badly broken and as a result significantly hurt consuming applications performances, increasing lag and slowing data processing.
We'll talk of existing strategies, including how you can replace an existing struggling topic with a new one and rebalance the data across new partitions using new rules. What dangers can happen and what to do when the state of keys is no longer guaranteed? Why is partition scaling considered to be a dangerous operation? We'll also look at this problem from the point of view of consumers, how to scale them to more partitions and what to keep in mind when using stateful systems.
This talk is for those who have sufficient expertise with Apache Kafka and want to bring their knowledge to the next level. However, we'll use simple language and accessible explanations, so even if you're a Kafka beginner, join this session to understand the challenges of uneven data replication and strategies to fix it.