Ahorra un 25 % (o incluso más) en tus costes de Kafka | Acepta el reto del ahorro con Kafka de Confluent

Mar 19, 2025Lecturas: 7 min

Tableflow Is GA: Unifying Apache Kafka® Topics with Apache Iceberg™️ and Delta Lake Tables in a Few Clicks

Escrito por

Marc SelwanStaff Product Manager
Kasun IndrasiriSenior Product Manager
Chris HollySenior Product Manager

Mar 19, 2025Lecturas: 7 min

Today marks a major milestone for Confluent: Tableflow is now generally available! And it’s about to make the lives of data engineers much easier. As of today, Tableflow is generally available on AWS and we will support Azure and GCP very soon.

If you’ve ever tried to stream Apache Kafka® data into a data lake, you know the pain. Pipelines are fragile. Bad data creates more cleanup work. Data gets duplicated. Type conversions between formats causes headaches. And even when it works, it’s slow and expensive.

Tableflow solves all of that. Now you can take any Kafka topic with a schema and expose it as an Apache Iceberg™️ or Delta Lake table. No stitching pipelines, less data duplication, no schema headaches. Just enable Tableflow, and your Kafka data becomes instantly accessible to your analytics and AI tooling.

‎

This is one of the most impactful products we’ve ever launched. We’ve gone from helping users manage their streams to helping them manage their streams and tables simultaneously, without any additional work needed.

Why the Data Lake and Warehouse Need Operational Data

If you’re using Kafka to capture real-time events, you already know the value of streaming data. But streaming alone isn’t enough. You need to analyze that data, join it with historical data, and feed it into Artificial Intelligence (AI) models to make real decisions—and for that, you need it in a structured format in your data lake or data warehouse.

However, the data warehouse has always worked on tables rather than streams. And those tables have been tightly coupled to their processing engines—so a table in Redshift was fundamentally not usable in Snowflake or Trino, for example.

Kafka data doesn’t arrive in the different table formats used by the data lake. It typically gets dumped off raw and messy, requiring additional processing just to make it usable. That’s because Kafka was designed for real-time event streaming, not for analytical processing. What works for high-throughput, low-latency messaging often clashes with the structured requirements of analysis and querying tables within a data warehouse or data lake.

Over the past several years, we’ve seen the emergence of open table formats such as Delta Lake and Apache Iceberg™️ that most lakehouses have standardized on. That’s why moving Kafka data into Iceberg and Delta tables matters. It transforms raw Kafka events into structured, queryable tables that downstream tools can instantly access, powering everything from customer insights to machine learning models.

But getting Kafka data into those tables is harder than it sounds. The data needs to be properly mapped, converted, and cleaned before it’s usable.

Why Getting Kafka Data into Tables is so Hard

We’ve spoken to well over 100 customers about the challenge of getting Kafka data into a data lake, and the message is always the same: It’s fragile, expensive, and difficult to manage.

A typical process might involve:

Reading the data out of Kafka – Either build a Spark job, configure a Kafka consumer, or set up a sink connector. Each option requires tuning and maintenance, and when it breaks, you have to troubleshoot the infrastructure and restart processing without creating duplicates.
Transforming the data – Kafka messages are serialized in Avro, JSON, or Protobuf, but data lakes typically require Parquet. You need to write custom logic to convert data types and map fields. Type mismatches and serialization errors often cause ingestion failures.
Schema evolution – Schemas can evolve frequently and users typically need to build schema handling logic to make sure the schema of a message is what you expect it to be or handle the differences.
Compaction and file management – Streaming data to object storage produces lots of small files, which slows down query performance. Users are usually left figuring out their own compaction processes, which can be very compute- and cost-intensive.

Building and updating data pipelines to stream data to analytics environments requires significant effort.

Everything we just mentioned is typically done for just one stream. Imagine scaling this to N number of streams—it’s painful! That’s why so many businesses struggle to make Kafka data usable for AI and analytics downstream in the lake or warehouse. But it doesn’t have to be that way any longer.

Easily Representing Kafka Topics as Iceberg or Delta Lake Tables with Tableflow

Tableflow eliminates the complexity outlined above by representing Kafka topics and their associated schemas as Iceberg or Delta Lake tables in a few clicks. No custom code or pipelines to scale and manage—just ready-to-query tables, instantly available for AI and analytics.

What’s New with the GA Release

Iceberg Tables: Now generally available, allowing Confluent Cloud users to expose streaming data in Kafka topics as Iceberg tables.
Delta Lake Tables: Now in Early Access, enabling users to explore materializing Kafka topics as Delta Lake tables in their own storage. These tables can be consumed as storage-backed external tables while we continue refining and enhancing its capabilities.

How Tableflow Operates Behind the Scenes to Solve the Challenge

Tableflow uses innovations in our Kora Storage Layer that give us the flexibility to take Kafka segments and write them out to other storage formats—in this case, parquet files. Tableflow also uses a new metadata publishing service behind the scenes that taps into Confluent’s Schema Registry to generate Iceberg metadata and Delta transaction logs while handling schema mapping, schema evolution, and type conversions.

Here’s how it works behind the scenes:

Data Conversion – Converts Kafka segments and schemas in Avro, JSON, or Protobuf into Iceberg- and Delta-compatible schemas and parquet files, using Schema Registry in Confluent Cloud as the source of truth.
Schema Evolution – Tableflow automatically detects schema changes such as adding fields or widening types and applies them to the respective table.
Catalog Syncing – Sync Tableflow-created tables as external tables in AWS Glue, Snowflake Open Catalog, Apache Polaris, and Unity Catalog (coming soon).
Table Maintenance and Metadata Management – Tableflow automatically compacts small files when it detects enough of them and also handles snapshot and version expiration.
Choose your storage - You can choose to store the data in your own Amazon S3 bucket or let Confluent host and manage the storage for you.

Tableflow eliminates the complexities of data preprocessing by automatically representing your Kafka data as high-quality Iceberg or Delta tables.

Enabling Tableflow: It’s Really That Easy

Turning your data pipelines from patchwork to precision has never been so easy. To enable Tableflow, simply:

Open Confluent Cloud
Select a Kafka topic
Click Enable Tableflow
Choose where you want to store the data

Your stream is now queryable as a table.

Apache Flink® and Stream Processing: The Next Step in Analytics-Ready Tables

Tableflow handles the heavy lifting of turning Kafka data into structured tables, but real-world data is often messy even after it lands in a table. To make it truly analytics-ready, especially for predefined queries, you need to process and refine the data before landing it in the lake.

After all, do you always want your table to look exactly like a single topic? Does downstream analysis often require you to merge multiple streams into a single table in your existing data engineering pipelines? If that’s the case, isn’t it better to shift the processing of those tables to the left so that you can maintain clean tables and the speed of Kafka?

That’s where stream processing from Apache Flink® comes in.

With Flink, you can perform real-time transformations on Kafka streams before they ever reach a table. Flink enables you to filter, transform, aggregate, join, and enrich data streams, allowing you to clean up raw Kafka events and shape them into something that’s immediately usable for AI and analytics.

By combining Tableflow with Flink, you ensure that your lake has the freshest data, landing pretransformed and ready for query. It prevents teams from inadvertently running up costs by processing the same raw data in the same ways, and it ensures that schema/data changes upstream are caught by governance tools before becoming an issue in the lake.

Seamless Interoperability With Partner Catalogs and Analytics Engines

Confluent has partnered with Amazon Web Services (AWS), Databricks, and Snowflake to create native integrations with their respective Iceberg and Delta Lake catalogs. Tableflow supports integrations with AWS Glue, Databricks Unity Catalog (coming soon), and Snowflake Open Catalog. It also supports associated analytical tools (e.g., AWS analytics services, Databricks Intelligence Platform, and Snowflake Cortex AI). As you can see below, our partners are very excited.

Together, Tableflow and our partners’ native integrations make streaming data readily discoverable and instantly accessible into leading data lakes and data warehouses such as Amazon Athena, Amazon EMR, Amazon Redshift, Amazon SageMaker Lakehouse, Databricks, and Snowflake. Tableflow also incorporates an embedded Iceberg REST Catalog, which serves as the authoritative repository for all Iceberg tables generated by the platform.

Native integrations and an embedded Iceberg REST Catalog make Tableflow’s tables easily accessible by leading data lake and warehouse solutions.

Bringing This All Together

Beyond Tableflow simply giving a better way to drive operational data out of Kafka and into the data lake, the unification of governance, stream processing, and data formats does something more fundamental. It creates a single, consistent view of your data across both operational and analytical systems. Instead of dealing with fragmented pipelines and mismatched schemas, you have one integrated system where real-time events and historical data align perfectly. Tableflow materializes Kafka data as Iceberg and Delta tables in a clean, structured format, and stream processing ensures that data is enriched, deduplicated, and mapped consistently before it even reaches the table. That means your data is not only available to your business intelligence and AI tools but is also complete, accurate, and immediately actionable.

This single view transforms how businesses work with data. When operational and analytical systems are in sync, you can analyze customer behavior as it happens, adjust pricing based on real-time demand, and feed AI models with constantly updated data without building or maintaining complex pipelines. Decisions are no longer based on stale or incomplete data because everything is aligned in one place.

The combination of Tableflow and stream processing doesn’t just reduce complexity. It creates a unified data foundation that allows your business to move faster, respond smarter, and drive better outcomes.

What’s Next for Tableflow

While we’re excited about the initial launch of Tableflow, this is just the beginning for the product and engineering teams. We have an ambitious road map where you’ll see new functionality roll out over the course of the year, such as:

Bringing Tableflow to Microsoft Azure and Google Cloud
The general availability of Tableflow for Delta Lake and Unity Catalog
Upsert table support, allowing Tableflow to handle deduplication and merge by primary key operations
Additional configurations, including the ability to send incompatible records to a dead letter queue and custom partitioning

Get Started With Confluent Cloud and Tableflow

With Tableflow now generally available for Iceberg and in Early Access for Delta Lake tables, it's the perfect time to explore how it can streamline your real-time data workflows. Get started today, share your feedback, and join us as we continue to push the boundaries of seamless data integration. We’re already working on the next enhancements to make Tableflow even more powerful. Stay tuned!

‎

Apache®, Apache Kafka®, Kafka®, Apache Flink®, Flink®, Apache Iceberg™️, and Iceberg™️ are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by the Apache Software Foundation is implied by using these marks. All other trademarks are the property of their respective owners.

Marc Selwan is the staff product manager for the Kora Storage team at Confluent. Prior to Confluent, Marc held product and customer engineering roles at DataStax, working on storage and indexing engines for Apache Cassandra.
Kasun is a Senior Product Manager at Confluent, driving innovation in the Tableflow product. He has extensive expertise in data streaming and application integration and previously led product management of the Azure Event Hubs product at Microsoft. He is the author of gRPC: Up and Running, and Microservices for Enterprise books. He has also shared his insights as a speaker at popular conferences like Current, KubeCon, and GOTO.
Chris Holly is a Senior Product Manager for Tableflow at Confluent. He is responsible for the design and direction of integrating Tableflow with the Confluent ecosystem. Prior to joining Confluent, Chris was at Amazon Web Services where he led product strategy for integrating streaming event sources like Apache Kafka with AWS Lambda.

¿Te ha gustado esta publicación? Compártela ahora

Confluent for VS Code Simplifies Real-Time Data Streaming Projects for Developers

Mar 19, 2025

We’re excited to announce that Confluent for VS Code is now Generally Available with Confluent Cloud and Confluent Platform! This integration with Visual Studio Code is open source, readily accessible on the VS Code Marketplace, and supports all forms of Apache Kafka® deployments.

New in Confluent Cloud: Tableflow, Freight Clusters, Apache Flink® AI Enhancements, and More

Mar 19, 2025

Confluent Cloud Q1 ’25 introduces Tableflow, Freight clusters, and Flink AI enhancements

Yashwanth Dasari

Tableflow Is GA: Unifying Apache Kafka® Topics with Apache Iceberg™️ and Delta Lake Tables in a Few Clicks

Get Started with Confluent Cloud

Escrito por

Why the Data Lake and Warehouse Need Operational Data

Why Getting Kafka Data into Tables is so Hard

Easily Representing Kafka Topics as Iceberg or Delta Lake Tables with Tableflow

What’s New with the GA Release

How Tableflow Operates Behind the Scenes to Solve the Challenge

Enabling Tableflow: It’s Really That Easy

Apache Flink® and Stream Processing: The Next Step in Analytics-Ready Tables

Seamless Interoperability With Partner Catalogs and Analytics Engines

Bringing This All Together

What’s Next for Tableflow

Get Started With Confluent Cloud and Tableflow

Get Started with Confluent Cloud

¿Te ha gustado esta publicación? Compártela ahora

Confluent for VS Code Simplifies Real-Time Data Streaming Projects for Developers

New in Confluent Cloud: Tableflow, Freight Clusters, Apache Flink® AI Enhancements, and More

Why the Data Lake and Warehouse Need Operational Data

Why Getting Kafka Data into Tables is so Hard

Easily Representing Kafka Topics as Iceberg or Delta Lake Tables with Tableflow

What’s New with the GA Release

How Tableflow Operates Behind the Scenes to Solve the Challenge

Enabling Tableflow: It’s Really That Easy

Apache Flink® and Stream Processing: The Next Step in Analytics-Ready Tables

Seamless Interoperability With Partner Catalogs and Analytics Engines

Bringing This All Together

What’s Next for Tableflow

Get Started With Confluent Cloud and Tableflow

Get Started with Confluent Cloud

¿Te ha gustado esta publicación? Compártela ahora

Suscríbete al blog de Confluent

Confluent for VS Code Simplifies Real-Time Data Streaming Projects for Developers

New in Confluent Cloud: Tableflow, Freight Clusters, Apache Flink® AI Enhancements, and More