[Webinar] Kafka + Disaster Recovery: Are You Ready? | Register Now

What’s New in Apache Kafka 3.3

We are proud to announce the release of Apache Kafka® 3.3 on behalf of the Apache Kafka community. The 3.3 release contains many new features and improvements. This blog post will highlight some of the more prominent features. For a full list of changes, be sure to check the be sure to check the 3.3.0 and 3.3.1 release notes.

For several years, the Apache Kafka community has been developing a new way to run Apache Kafka with self-managed metadata. This new mode, called KRaft mode, improves partition scalability and resiliency while simplifying deployments of Apache Kafka. It also eliminates the need to run an Apache ZooKeeperTM cluster alongside every Apache Kafka cluster.

The 3.3 release now marks KRaft mode as production ready for new clusters only. There are some features that are currently supported by Apache ZooKeeper (ZK) mode that are not yet supported by KRaft mode. For more information on these features and proposed KRaft timelines, read KIP-833.

Kafka Broker, Controller, Producer, Consumer and Admin Client

KIP-833: Mark KRaft as Production Ready

KIP-833 marks KRaft as production-ready for new clusters in the Apache Kafka 3.3 release. KIP-833 also marks 3.5.0 as the bridge release. The bridge release is the release that would allow the migration of Apache Kafka clusters from ZK mode to KRaft mode.

KIP-778: KRaft to KRaft upgrades

KIP-778 allows the upgrade of KRaft clusters without the need for the infamous “double roll”. In order to facilitate upgrades of Apache Kafka in KRaft mode, we need the ability to upgrade controllers and brokers while holding back the use of new RPCs and record formats until the whole cluster has been upgraded.

KIP-841: Fenced replicas should not be allowed to join the ISR in KRaft

KIP-841 improves the topic partitions’ availability during clean shutdown. It does this by enforcing the following invariants: 1) a fenced or in-controlled-shutdown replica is not eligible to be in the ISR; and 2) a fenced or in-controlled-shutdown replica is not eligible to become leader.

KIP-836: Expose replication information of the cluster metadata

KIP-836 exposes the DescribeQuorum API to the Admin client and adds two new fields per replica to the response. This information can be used to query the availability and lag of the cluster metadata for the controllers and brokers in a KRaft cluster.

KIP-835: Monitor KRaft Controller Quorum health

With KRaft mode, Apache Kafka added a new controller quorum to the cluster. These controllers need to be able to commit records for Apache Kafka to be available. KIP-835 measures availability by periodically causing the high-watermark and the last committed offset to increase. Monitoring services can compare that these last committed offsets are advancing. They can also use these metrics to check that all of the brokers and controllers are relatively within each other’s offset.

KIP-859: Add metadata log processing error related metrics

With KRaft mode, the cluster metadata replicated log is the source of metadata related information for all servers in the cluster. Any errors that occur while processing this log could lead to the in-memory state for the server becoming inconsistent. It is important that such errors are made visible. KIP-859 exposes metrics that can be monitored so that affected servers can be discovered.

KIP-794: Strictly Uniform Sticky Partitioner

KIP-794 improves the default partitioner to distribute non-keyed data evenly in batches among healthy brokers and less data to unhealthy brokers. For example, the p99 latency for a producer workload with abnormal behavior was reduced from 11s to 154ms.

KIP-373: Allow users to create delegation tokens for other users

KIP-373 allows users to create delegation tokens for other users. This allows the following use cases: 1) a designated superuser can create tokens without requiring individual user credentials; and 2) a designated superuser can run kafka clients on behalf of another user.

KIP-831: Add metric for log recovery progress

Log recovery is a process triggered when a Kafka server starts up, if it had a previous unclean shutdown. It is used to make sure the log is in a good state and is not corrupted. KIP-831 exposes metrics to allow users to monitor the progress of log recovery.

KIP-709: Extend OffsetFetch RPC to accept multiple group ids

KIP-709 streamlines the process of fetching offsets from consumer groups so that a single request can be made to fetch offsets for multiple groups. This carries the following advantages: 1) it reduces request overhead; and 2) it simplifies client side code.

KIP-827: Expose log dirs total and usable space via Kafka API

KIP-827 exposes an RPC for querying the disk total size and disk usage size per log directory. This is useful for tools that are interested in querying this information without relying on the exposed metrics.

KIP-851: Add requireStable flag into ListConsumerGroupOffsetsOptions

KIP-851 adds the option in the Admin client for querying the committed offsets when using exactly once semantics.

KIP-843: Adding addMetricIfAbsent method to Metrics

KIP-843 allows the metrics API to atomically query a metric if present or create a metric if absent.

KIP-824: Allowing dumping segment logs limiting the batches in the output

KIP-824 allows the kafka-dump-logs tool to be configured to only scan and print the records at the start of the log segment instead of the entire log segment.

Kafka Streams

KIP-846: Source/sink node metrics for consumed/produced throughput in Streams

With the metrics available today in the plain consumer it is possible for users of Kafka Streams to derive the consumed throughput of their applications at the subtopology level, but the same is not true for the produced throughput.

KIP-846 fills in this gap and gives end users a way to compute the production rate of each subtopology by introducing two new metrics for the throughput at sink nodes. Even though it is possible to derive the consumed throughput with existing client level metrics, KIP-846 also adds two new metrics for the throughput at source nodes to provide an equally fine-grained metrics scope as for the newly added metrics at the sink nodes and to simplify the user experience.

KIP-834: Pause/resume KafkaStreams topologies

KIP-834 adds the ability to pause and resume topologies. This can be used to reduce resources used or modify data pipelines. Paused topologies skip processing, punctuation, and standby tasks. For distributed Streams applications, each instance will need to be paused and resumed separately.

KIP-820: Consolidate KStream transform() and process() methods

KIP-820 generalizes the KStream API to consolidate Transformers (which could forward results) and Processors (which could not). The change makes use of the new type-safe Processor API. This simplifies Kafka Streams, making it easier to use and learn.

KIP-812: Introduce another form of the KafkaStreams.close() API that forces the member to leave the consumer group

KIP-812 can efficiently close the stream permanently by forcing the member to leave the consumer group.

Kafka Connect

KIP-618: Exactly-once support for source connectors

KIP-618 adds exactly one semantic support to source connectors. The Connect framework was expanded to atomically write source records and their source offsets to Apache Kafka, and to prevent zombie tasks from producing data to Apache Kafka.

Summary

In addition to all of the KIPs listed above, Apache Kafka 3.3 is packed with fixes and improvements. To learn more:

This was a community effort, so thank you to everyone who contributed to this release, including all our users and our 116 authors:

Akhilesh C, Akhilesh Chaganti, Alan Sheinberg, Aleksandr Sorokoumov, Alex Sorokoumov, Alok Nikhil, Alyssa Huang, Aman Singh, Amir M. Saeid, Anastasia Vela, András Csáki, Andrew Borley, Andrew Dean, andymg3, Aneesh Garg, Artem Livshits, A. Sophie Blee-Goldman, Bill Bejeck, Bounkong Khamphousone, bozhao12, Bruno Cadonna, Chase Thomas, chern, Chris Egerton, Christo Lolov, Christopher L. Shannon, CHUN-HAO TANG, Clara Fang, Clay Johnson, Colin Patrick McCabe, David Arthur, David Jacot, David Mao, Dejan Maric, dengziming, Derek Troy-West, Divij Vaidya, Edoardo Comar, Edwin, Eugene Tolbakov, Federico Valeri, Guozhang Wang, Hao Li, Hongten, Idan Kamara, Ismael Juma, Jacklee, James Hughes, Jason Gustafson, JK-Wang, jnewhouse, Joel Hamill, John Roesler, Jorge Esteban Quilcate Otoya, José Armando García Sancio, jparag, Justine Olshan, K8sCat, Kirk True, Konstantine Karantasis, Kvicii, Lee Dongjin, Levani Kokhreidze, Liam Clarke-Hutchinson, Lucas Bradstreet, Lucas Wang, Luke Chen, Manikumar Reddy, Marco Aurelio Lotz, Matthew de Detrich, Matthias J. Sax, Mickael Maison, Mike Lothian, Mike Tobola, Milind Mantri, nicolasguyomar, Niket, Niket Goel, Nikolay, Okada Haruki, Philip Nee, Prashanth Joseph Babu, Rajani Karuturi, Rajini Sivaram, Randall Hauch, Richard Joerger, Rittika Adhikari, RivenSun, Rohan, Ron Dagostino, ruanliang, runom, Sanjana Kaundinya, Sayantanu Dey, SC, sciclon2, Shawn, sunshujie1990, Thomas Cooper, Tim Patterson, Tom Bentley, Tom Kaszuba, Tomonari Yamashita, vamossagar12, Viktor Somogyi-Vass, Walker Carlson, Xavier Léauté, Xiaobing Fang, Xiaoyue Xue, xjin-Confluent, xuexiaoyue, Yang Yu, Yash Mayya, Yu, YU, yun-yun

This post was originally published by Jose Armando Garcia Sancio on The Apache Software Foundation blog.

  • Jose Armando Garcia Sancio is a staff software engineer and Apache Kafka committer at Confluent.

Did you like this blog post? Share it now