[Virtual Event] GenAI Streamposium: Learn to Build & Scale Real-Time GenAI Apps | Register Now

Oct 11, 2016Temps de lecture: 4 min

Log Compaction | Highlights in the Apache Kafka and Stream Processing Community | October 2016

Écrit par

Gwen ShapiraEngineering Manager, Confluent

Oct 11, 2016Temps de lecture: 4 min

This month the community has been focused on the upcoming release of Apache Kafka 0.10.1.0. Led by the fearless release manager, Jason Gustafson, we voted on a release plan, cut branches and started voting on the first release candidate. Please contribute to the community by downloading the release candidate, testing it out and letting everyone know how it went. If no serious bugs are found, we are hoping to finalize the release by mid-October.

In addition to the vote, we gave our website a quick facelift, contribution of Derrick Or. We appreciated the feedback from the community and issues were quickly addressed.

And as usual, there are several very lively discussions in the community:

KIP-74: Proposal to limit not just the amount of data returned by a consumer fetch per partition, but also the amount of data returned for each fetch request overall. This will give users better control over the memory usage of consumers, but even better – this allows consumers to make progress even if a partition contains messages larger than the maximum fetch size. This proposal has been merged and will be part of the 0.10.1.0 release.
KIP-79: Proposal to add methods for searching by timestamp to the new consumer was accepted and merged. It will be included in the next release to everyone’s great joy.
KIP-82: Proposal for adding headers to Kafka messages. This proposal is very popular because so many organizations are using headers internally. It is also controversial – Kafka project has a long tradition of keeping the message completely unstructured and letting the users and client put whatever structure they need inside the message. Whatever the decision is, it will have serious impact on the Apache Kafka ecosystem.
KIP-83: Much welcome proposal that allows to instantiate clients with different security configurations in the same JVM. There are already patches available by Rajini Sivaram and Edurdo Comar and once integrated it will allow us to update MirrorMaker to support different security configurations on source and target clusters.
KIP-85: Allowing clients to take JAAS configurations dynamically rather than via a file. This will be huge for those of us implementing microservices in containers – adding files to containers has been very inconvenient.

In addition to ongoing Kafka improvements, there are other interesting news and blogs:

Google are talking about use of Kafka in GCP and their new Kafka connectors.
Good summary of the big announcements for the Streams community from Strata.
Dean Wampler talks to O’Reilly about streams architecture.
Tutorial at Strata showing how to build customer 360 architecture using Apache Kafka, Spark Streaming and Kudu. One of the main take-aways is that modern data architectures no longer assume that all the data you need is found in one database – instead they solve the data integration problem.
How to test Kafka Streams topologies – because testing is the most important part of development.
From CapitalOne, a great StrangeLoop talk: Commander: Better Distributed Applications through CQRS, Event Sourcing, and Immutable Logs.
We made recommendations on how to move to the cloud with Kafka and added enterprise features.
Using MirrorMaker? Want to use the new Consumer? Here are some gotchas you want to be aware of.
And for the theory-inclined: Fascinating paper on graph processing on streams.

If you are interested in learning all about streaming data platforms, Confluent has released a 6-part online talk series focusing on Apache Kafka. You can view the recordings for the first two talks in the series by Jay Kreps and Jun Rao, and register for the upcoming sessions at https://www.confluent.io/apache-kafka-talk-series.

Gwen Shapira is a Software Enginner at Confluent. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. She currently specialises in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, an author of books including “Kafka, the Definitive Guide”, and a frequent presenter at data related conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.

Avez-vous aimé cet article de blog ? Partagez-le !

Powering AI Agents with Real-Time Data Using Anthropic’s MCP and Confluent

Mar 25, 2025

Model Context Protocol (MCP), introduced by Anthropic, is a new standard that simplifies AI integrations by providing a secure and consistent way to connect AI agents with external tools and data sources…