Log Compaction – Highlights in the Apache Kafka and Stream Processing Community – February 2017

Written By

Gwen ShapiraEngineering Manager, Confluent

Feb 8, 2017Read Time: 3 min

As always, we bring you news, updates and recommended content from the hectic world of Apache Kafka^® and stream processing.

Sometimes it seems that in Apache Kafka every improvement is preceded by an involved KIP process. This month we’ve merged a great patch that improved the 99% latency of Kafka without requiring user visible changes: https://issues.apache.org/jira/browse/KAFKA-4614. Not only does it make a fast system even faster, the JIRA itself is worthy of study. I wish all JIRAs included this level of research.

Some important improvements do require KIPs. Here is what we’ve seen in active discussions this month:

KIP-112: Handle disk failure for JBOD and its close relative KIP-113: Support replicas movement between log directories. Both these KIPs improve Kafka’s behavior in the common case where the broker’s data is written to a number of directly mounted disks on the broker server (rather than using RAID). With these improvements, Kafka will be able to survive failure of a single disk without taking down an entire broker, and it will allow admins to control the placement of replicas on disk – useful in cases where disks or replicas have uneven sizes.
KIP-117: Add a public AdminClient API for Kafka admin operations: This lets developers create, modify and delete topics and ACLs without using internal APIs which are subject to incompatible changes and without requiring ZooKeeper connection from the applications.
KIP-98: The famous KIP that adds transactional semantics and exactly-once to Kafka is now under voting. This means that the Wiki now contains all the public changes. If you haven’t read it yet, now is a good time.
KIP-118 suggests we remove support for Java 7 in the next major release (0.11). We don’t know yet when 0.11 will get released, but we know it will be later than June.
KIP-110 suggests adding support for a new compression codec: ZStandard Compression. The new compression, written by Facebook, looks very promising.
KIP-109 suggests marking the old consumers as deprecated, as a hint for developers that they should start migrating to the new clients. As the KIP states, the old consumers are missing important features like security that were only added in the new clients.

Notable Blogs and Presentations:

One of the basic design patterns of Microservices is creating a local cache or materialized view. Keeping the cache updated can be a challenge. Zach Cox explains the challenges in maintaining a local cache for a service and provides several solutions using different Kafka APIs.
Plumbr used Kafka to transition from a monolith to microservices as they scaled their architecture.
Sky Betting & Gaming published their Kafka-centric streaming architecture.
And since everyone loves benchmarks: Comparing the different compression codecs in Apache Kafka.
Trulia talks about how they use Kafka to drive a machine learning system, which they use to offer personalized experiences in mobile and desktop.

Gwen Shapira is a Software Enginner at Confluent. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. She currently specialises in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, an author of books including “Kafka, the Definitive Guide”, and a frequent presenter at data related conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.

Did you like this blog post? Share it now

Building Streaming Data Pipelines, Part 1: Data Exploration With Tableflow

Apr 25, 2025

This blog post demonstrates using Tableflow to easily transform Kafka topics into queryable Iceberg tables. It uses UK Environment Agency sensor data as a data source, and shows how to use Tableflow with standard SQL to explore and understand the data.

Robin Moffatt

Guide to Consumer Offsets: Manual Control, Challenges, and the Innovations of KIP-1094

Apr 21, 2025

The guide covers Kafka consumer offsets, the challenges with manual control, and the improvements introduced by KIP-1094. Key enhancements include tracking the next offset and leader epoch accurately. This ensures consistent data processing, better reliability, and performance.

Alieh Saeedi

Log Compaction – Highlights in the Apache Kafka and Stream Processing Community – February 2017

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Written By

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Did you like this blog post? Share it now

Building Streaming Data Pipelines, Part 1: Data Exploration With Tableflow

Guide to Consumer Offsets: Manual Control, Challenges, and the Innovations of KIP-1094

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Did you like this blog post? Share it now

Subscribe to the Confluent blog

Building Streaming Data Pipelines, Part 1: Data Exploration With Tableflow

Guide to Consumer Offsets: Manual Control, Challenges, and the Innovations of KIP-1094