Introducing Confluent Private Cloud: Cloud-Level Agility for Your Private Infrastructure | Learn More
We’re excited to announce our Real-Time Context Engine, now available in Early Access. It’s a key part of Confluent Intelligence, our vision to bring real-time data directly to production AI systems through the power of Apache Kafka® and Apache Flink®.
Kafka and Flink already power streaming pipelines for AI, collecting events, processing them in real time, and transforming them into clean, enriched streams. But serving that same data in a form AI systems can use is a different challenge. Teams still spend hours mapping mismatched data models between streams and AI apps, enforcing governance and access across disparate sources, and rebuilding pipelines every time schemas or upstream systems change. What’s missing is a serving layer that keeps data consistent, secure, and instantly queryable in real time.
Real-Time Context Engine solves that last-mile problem, unifying the serving layer for streaming data into a single managed service. Real-Time Context Engine continuously materializes enriched enterprise data sets into a fast, in-memory cache and serves them to AI systems through Model Context Protocol (MCP), all fully managed within Confluent Cloud. The complexity of Kafka and Flink stays under the hood. Developers simply request the data they need, and it’s there, live and ready to power their production AI applications.
Most of us have tried connecting raw data to ChatGPT or Claude to speed up basic tasks, but building an intelligent system that can adapt and act reliably with an acceptable level of human interaction in production is different. That’s because a reliable production AI system that automates actual functions needs enriched, contextual data about your business. An agent automating order fulfillment, for example, needs constantly updated information on products, shipments, and inventory; that is data continuously joined and enriched from multiple sources.
There are a few familiar ways we might try to engineer that context from operational and analytical systems but it turns out, they fall short.
New AI protocols such as MCP make it simple to connect AI systems to live data. They expose endpoints that let agents request information or take actions using natural language. It’s often where teams begin, connecting directly to source systems through MCP, pointing a model at them, and watching it respond in real time.
The results look impressive in a demo but don’t quite work for production usage for a number of reasons:
Raw, Unusable Data: Raw data from source systems is tightly coupled to the apps that created it. It’s littered with cryptic codes and IDs no human or AI can interpret without knowing the app’s internals. It needs to be enriched and transformed into meaningful context before it’s useful.
Custom Security and Access Control: To make live queries possible, teams either end up exposing entire APIs or databases or building complex access controls and filters around each endpoint.
Operational Load: The same production systems get hit repeatedly as AI queries pile on top of normal workloads, overloading databases and APIs that were never designed for it. On top of that, teams have to maintain fleets of MCP servers to keep data flowing.
Token Cost: Every redundant or unfiltered field sent to a model drives up token usage, making each query more expensive and slower to respond.
It quickly becomes clear why this approach doesn’t scale. You end up querying dozens of source systems in an insecure, operationally heavy way and overloading them just to retrieve raw data that remains unusable until it’s enriched.
What’s really needed isn’t more access to raw data but instead a curated set of clean, derived data that carries real business meaning.
Batch systems seem like the natural fit to build these clean, derived data sets. After all, most of our data is in a data lakehouse or warehouse. In development, this shines: you can test, iterate, and prove your models are working on real enterprise data.
You can then load large batches of processed data into a serving layer such as a vector database for semantic search or an operational database for lookups. But once in production, the cracks appear. The data is already stale the moment it’s loaded, and every update depends on the next scheduled batch job.
Consider an airline operations example: an AI assistant managing flight schedules and gate assignments might depend on hourly or nightly refreshes of booking and crew data. But by the time that batch job runs, dozens of flights have changed status, crews have rotated, and gates have shifted. The model ends up reasoning about yesterday’s world while the real one moves in real time.
Across these approaches, each system optimizes for a single dimension—real-time access or contextual richness—but not both together. Streaming offers a fundamentally different foundation.
At its core, streaming with Apache Kafka is built on a commit log, a continuous record of every event that lets you process data as it happens and also replay the past exactly as it occurred. In theory, that means one system could evaluate history, process the present, and serve results in real time. Yet most people don’t think of using streaming this way because in practice, it’s been hard to do. Many Kafka deployments only retain a few days of data, and reprocessing or querying historical events on the stream has been slow and cumbersome.
To make streaming actually usable, it needs to be a generalization of batch processing: something that can scale up to process all of history quickly for development but then take that same code and run it continually as new data arrives in production. You could view what we’re doing at Confluent with Tableflow, Flink, and more as an extension of this idea.
Tableflow extends Kafka’s commit log into durable, structured storage in open table formats such as Apache Iceberg™ or Delta Lake. These open tables are stored in object storage to reduce cost, and can feed any data lakehouse or analytical engine with real-time data. You can keep your data cost-effectively for as long as you need, query it like a table, and bridge the gap between real-time streams and analytical systems.
Flink unifies stream and batch semantics under one framework.
You can use the same SQL, Java, or Python with Snapshot Queries to process historical data stored on Tableflow up to 50-100x faster than if you were attempting to reprocess a stream. You can also react to new events as they arrive as Kafka streams, and continuously refine logic without switching programs. That stream-batch duality allows you to process and reprocess data in real-time, at massive scale.
Together, these innovations make streaming the foundation for a world where you can continuously evaluate, process, and serve context in real time. The final step is serving that context to AI systems, which still comes with its own complications. Teams often have to materialize streams into vector databases or wire up custom MCP servers, each with bespoke security, governance, and monitoring layers. That’s the part we’re excited to unify for you today.
Real-Time Context Engine is available today in Early Access and unifies the serving layer for streaming data into a single managed service. It turns unified streaming data that’s continuously refreshed and processed in real time into structured, trustworthy context that any AI app or agent can consume instantly.
Similar to how we extended our processed, governed streams into easily queryable open tables in object storage, we’ve materialized your enriched, streaming data into an in-memory, low latency cache for fast access for production apps. We serve this through our fully managed MCP, completely abstracting the complexity of Kafka and Flink from developers. They securely request the data they need to power their production AI applications.
Because it’s built directly on Confluent’s data streaming platform (DSP), Real-Time Context Engine unifies historical replay, continuous processing, and real-time serving in one place. It abstracts away Kafka and Flink complexity from the consumer behind a secure, cloud-native service with authentication, role-based access control (RBAC), and audit logging built in so developers can focus on building intelligent systems instead of operating infrastructure. When upstream definitions change, the data streaming platform automatically reprocesses affected data, ensuring downstream AI systems stay consistent without manual rebuilds or drift.
The result is an always-on source of context that’s continuously proven on history, enriched continuously, and served live. It’s the missing link between experiments and production AI systems.
Real-Time Context Engine is part of our broader vision with Confluent Intelligence: to bring real-time data directly to production AI systems through the power of Kafka and Flink. It builds on the same principles we’ve discussed—ingesting enterprise data as streams, enriching continuously, and serving live—but extends them beyond external AI apps and agents to intelligent systems built directly on Kafka and Flink as well. You can build AI/ML pipelines and applications with Streaming Agents directly on Flink with real-time data and serve real-time context to any AI app or agent through open interfaces like MCP.
What makes this powerful is that it all runs on our DSP. The same foundation that powers your operational systems, analytics, and data flows also powers your AI. You can evaluate history with replayable data, process continuously through Flink’s unified stream and batch engine, and serve live context through Real-Time Context Engine. That means the same real-time, contextualized, trustworthy data powers every intelligent system you build—everything is fully managed, deeply integrated, and powered by the same streaming backbone. And with role-based access, encryption, and governance built in from the start, every interaction is secure, auditable, and compliant by design.
Real-Time Context Engine is now available in Early Access. If you’re interested in joining the EA program, sign up here.
Streaming Agents is also available in Open Preview. You can read more, and explore our Quick Start guide to try it out yourself!
You can also sign up for Confluent Cloud today, and start exploring how to bring real-time data into your production AI systems. If you haven’t done so already, sign up for a free trial of Confluent Cloud to explore the new features. New sign-ups receive $400 to spend within Confluent Cloud during their first 30 days. Use the code CCBLOG60 for an additional $60 of free usage.*
Apache®, Apache Kafka®, and Kafka®, Apache Flink®, Flink®, Apache Iceberg™️, Iceberg™️, and the Kafka and Flink logos are either registered trademarks or trademarks of the Apache Software Foundation.
Tableflow on Confluent Cloud now supports Delta Lake, Unity Catalog, and Azure (EA) for secure, governed, real-time analytics from Apache Kafka data - no ETL or custom pipelines required.
Transform real-time Kafka data into governed, AI-ready Delta Lake tables with Confluent Tableflow and Databricks Unity Catalog. Simplify pipelines, ensure governance, and unlock real-time analytics and AI.