The Dark and Dirty Side of Fixing Uneven Partitions

« Kafka Summit London 2023

You might already use all known strategies to choose the right number of partitions for your newly created Apache Kafka topic. You apply the best recommendations to evenly distribute data across partitions in the topic. You even have metrics to observe and inform you on that. You do everything right.

But then reality happens. Despite best efforts, data is published unevenly, making it slow, expensive, and difficult to consume data from a topic. The future is full of unexpected impossible to predict events, and it doesn't care about rules or normal distributions.

This doesn't mean that we can simply disregard good practices. However, we need a plan for when things don't go according to anyone's calculations.

Come to this talk to learn what to do when the data distribution across topic partitions is badly broken and as a result significantly hurt consuming applications performances, increasing lag and slowing data processing.

We'll talk of existing strategies, including how you can replace an existing struggling topic with a new one and rebalance the data across new partitions using new rules. What dangers can happen and what to do when the state of keys is no longer guaranteed? Why is partition scaling considered to be a dangerous operation? We'll also look at this problem from the point of view of consumers, how to scale them to more partitions and what to keep in mind when using stateful systems.

This talk is for those who have sufficient expertise with Apache Kafka and want to bring their knowledge to the next level. However, we'll use simple language and accessible explanations, so even if you're a Kafka beginner, join this session to understand the challenges of uneven data replication and strategies to fix it.

Presenter

Olena Babenko

Aiven

Olena is a sStaff Software Engineer and a data engineer. Born in Ukraine, but now lives in Finland. For most of her career, she has worked for big companies such as Yandex and Zalando. Therefore, she understands that the problem of "too much data" does exist. This is why she has been a big fan of Kafka and Flink streaming for almost 4 years now. She believes that sharing knowledge is a win-win for both the audience and the speaker.

Presenter

Olena Kutsenko

Aiven

Olena is a seasoned expert in data, sustainable software development, and teamwork. With a background in software engineering, she's led teams and developed mission-critical applications at Nokia, HERE Technologies, and AWS. Currently, she works at Aiven where she supports developers and customers in using open-source data technologies such as Apache Kafka, ClickHouse, and OpenSearch. She is also an international public speaker and regularly present at conferences around the world. She holds AWS Developer and Solutions Architect certifications.

The Dark and Dirty Side of Fixing Uneven Partitions

Presenter

Olena Babenko

Presenter

Olena Kutsenko

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how