When Streaming Needs Batch

« Current 2022

A streaming application is started once and then continuously ingests endless, fairly steady streams of events. That's as far as the theory goes.

Unfortunately, reality is more complicated. Over time your application's ability to process large historical data sets robustly, efficiently and correctly will be critical:

for exploratory data analysis during development
for bootstrapping the initial state of an application
for back-filling following an outage or bugfix
for keeping up with bursty input streams

These scenarios call for batch processing techniques. Apache Flink is as streaming-first as it gets. Yet over the last releases, the community has invested significant resources into unifying stream- and batch processing on all layers of the stack: scheduler to APIs.

In this talk, I'll introduce Apache Flink's approach to unified stream and batch processing and discuss - by example - how these scenarios can already be addressed today and what might be possible in the future.

Presenter

Konstantin Knauf

Confluent

Konstantin is a member of the Apache Flink PMC, long-term contributor to the project and group product manager at Confluent. He joined the company early this year as part of the acquisition of Immerok which he had co-founded with a group of long-term community members earlier last year. Formerly, as Head of Product at Ververica, Konstantin supported multiple teams working on Apache Flink in both discovery as well as delivery. Before that he was leading the pre-sales team at Ververica, helping their clients as well as the Open Source Community to get the most out of Apache Flink.

When Streaming Needs Batch

Presenter

Konstantin Knauf

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how