Log Compaction | Highlights in the Apache Kafka and Stream Processing Community | September 2016

Écrit par

Gwen ShapiraEngineering Manager, Confluent

Sep 14, 2016Temps de lecture: 3 min

It is September and it’s evident that everyone is back from their summer vacation! We released Apache Kafka 0.10.0.1 which includes fixes of the bugs in the 0.10.0 release. In our last meeting we agreed to give time-based releases a try and immediately started planning Apache Kafka 0.10.1.0.

Confluent Platform 3.0.1 and Apache Kafka 0.10.0.1 were released. Lots of important bug fixes! If you are on Apache Kafka 0.10.0 or Confluent Platform 3.0.0, we recommend upgrading. If you are on an older release, please make sure you upgrade directly to the bugfix version.
We agreed to try Time-Based Release Plan. Aiming for 3 Apache Kafka releases a year (one every 4 months) and guaranteeing rolling upgrades for a duration of two years.
We started planning the next Apache Kafka release, which will have the version 0.10.1.0.. Much thanks to Jason Gustafson, Kafka’s newest committer for volunteering to drive the release. As usual, the community is encouraged to participate. Take a look at the release plan to learn how.
KIP-62 has been merged and will be included in Apache Kafka 0.10.1.0 and Confluent Plafrom 3.10. This KIP adds a background thread to the Kafka Consumer, allowing background heartbeats which will keep alive Consumers that stop polling. This should make it much easier to write consumers, especially consumers that need to process large amounts of data between iterations.
KIP-63, a proposal for improving caching in the Streams API in Kafka, was approved. This is a significant performance optimization that coalesces processing updates before sending them downstream, which reduces the load on Kafka clusters and on downstream external systems. It also paves the way for implementing new “trigger” behaviors.
KIP-71 was approved, allowing messages in topics to be both compacted and deleted. This will allow admins to impose disk constraints on compacted topics, by removing compacted keys which are older than the time limit or exceed disk space limits.
KIP-73 was approved, adding replication quotas or throttling to Apache Kafka. This feature is especially useful when reassigning replicas to brokers, allowing admins to limit the resources used by the reassignment process and therefore reducing the risk in reassignment. Replica reassignment has long been a difficult process in Apache Kafka, and we are excited about this improvement.
KIP-79, a proposal to evolve the Apache Kafka protocol to allow for requesting offsets according to timestamps (using the new timestamp indexes) is under active discussion. You are invited to take a look and share your feedback with the Kafka community.
Ben Stopford gave a very popular presentation on how Microservices and Apache Kafka fit together.
If you are curious to learn about the internals of the new Kafka Consumer Groups, you can watch this presentation from Kafka meetup at LinkedIn.
Want to learn how to choose a stream processing framework? Neha Narkhede and Stephan Ewen the Streams API in Kafka and Flink, providing good decision guidelines in the process.
Are Kafka Connect and Kafka Streams ready for production? The Kafka community says yes! LINE Corp. explain how they are using Kafka Streams in large-scale production, and WePay talk about their use of Kafka Connect in large-scale production.
Grant Henke explains the architectural benefits of Apache Kafka for decoupling.
Confluent has updated the schedule of training classes for developers and operators of Kafka. Online courses are also available.

Gwen Shapira is a Software Enginner at Confluent. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. She currently specialises in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, an author of books including “Kafka, the Definitive Guide”, and a frequent presenter at data related conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.

Avez-vous aimé cet article de blog ? Partagez-le !

Building Streaming Data Pipelines, Part 1: Data Exploration With Tableflow

Apr 25, 2025

This blog post demonstrates using Tableflow to easily transform Kafka topics into queryable Iceberg tables. It uses UK Environment Agency sensor data as a data source, and shows how to use Tableflow with standard SQL to explore and understand the data.

Robin Moffatt

Guide to Consumer Offsets: Manual Control, Challenges, and the Innovations of KIP-1094

Apr 21, 2025

The guide covers Kafka consumer offsets, the challenges with manual control, and the improvements introduced by KIP-1094. Key enhancements include tracking the next offset and leader epoch accurately. This ensures consistent data processing, better reliability, and performance.

Alieh Saeedi

Log Compaction | Highlights in the Apache Kafka and Stream Processing Community | September 2016

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Écrit par

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Avez-vous aimé cet article de blog ? Partagez-le !

Building Streaming Data Pipelines, Part 1: Data Exploration With Tableflow

Guide to Consumer Offsets: Manual Control, Challenges, and the Innovations of KIP-1094

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Avez-vous aimé cet article de blog ? Partagez-le !

Abonnez-vous au blog Confluent

Building Streaming Data Pipelines, Part 1: Data Exploration With Tableflow

Guide to Consumer Offsets: Manual Control, Challenges, and the Innovations of KIP-1094