Ahorra un 25 % (o incluso más) en tus costes de Kafka | Acepta el reto del ahorro con Kafka de Confluent

Bridging the Data Divide: How Confluent and Databricks Are Unlocking Real-Time AI

Escrito por

We’re excited to announce an expanded partnership between Confluent and Databricks to dramatically simplify the integration between analytical and operational systems. This is particularly important as enterprises want to shorten the deployment time of AI and real-time data applications. This partnership enables those enterprises to spend less time fussing over siloed data and governance and more time creating value for their customers.

The first capability we are offering together is a Delta Lake-first integration between Confluent and Databricks that will enable businesses to bridge the divide between real-time applications and their analytics and AI platform. Now, customers will have streamlined, bidirectional data flow between Confluent’s Tableflow – which converts Kafka logs into Delta Lake tables – and Databricks’ Unity Catalog. This integration unlocks real-time, governed data products from any source to power intelligent applications. Together, Confluent and Databricks will better enable enterprises to build and deploy AI that allows proactive, automated decision-making on operational systems with production-level speed and accuracy.

AI Needs Trustworthy Real-Time Data

Today, enterprises understand that AI tuned on their proprietary data is critical for their competitive stance in the market. As the investment in AI projects grows, a hard truth becomes apparent - AI is only as good as the data that fuels it and much of the proprietary data is siloed, fragmented and difficult to utilize in the service of AI. Many enterprises struggle to make AI operational because their data is siloed into different systems that weren’t built to work together.

Most organizations have two critical data silos:

  • Operational systems that power applications, transactions, and real-time events.

  • Analytical systems that drive data intelligence and AI for better decision-making.

AI models require fresh, real-time operational data to make accurate predictions, while real-time applications need AI-generated insights from analytical systems to improve accuracy and automate decisions. Today, data often moves between these silos in batch jobs that are slow, brittle, and manual, with governance and lineage getting lost along the way. This was tolerable for offline model training in traditional machine learning, but is a huge issue for LLMs and agentic AI where wrong or outdated data means poor reasoning and incorrect decisions.

Confluent and Databricks are working together to change this, ensuring that data moves seamlessly between operational and analytical systems, while preserving its trust, context, and usability for AI.  

Defining the Solution: AI-Ready, Real-Time Data Products

The key to making AI truly operational is real-time, trusted data products – not just raw data, but governed, reusable assets – designed to power AI and analytics, regardless of their origin.

Through deep integration between Confluent and Databricks, we are making data products an enterprise-wide reality in two key phases, with an underlying foundation of integrated governance:

  • Transforming operational data into AI-ready assets: Confluent’s Tableflow seamlessly structures Kafka event streams into Delta Lake, where Databricks provides large-scale data transformation, feature engineering, and ML model training. When combined with Databricks’ Mosaic AI, users can train and serve models directly on streaming data, ensuring AI continuously learns from live, trusted event streams to improve predictions and decision-making.

  • Bringing AI-driven intelligence into real-time applications: Once AI models in Databricks’ Data Intelligence Platform generate insights, those predictions can flow back into Kafka by updating Delta tables integrated with Tableflow. With that, businesses can automate decisions instantly, rather than reacting hours or days later, with the ability to continuously improve applications’ efficacy.

  • Unifying governance and access across real-time and AI-driven data: The integration between Databricks Unity Catalog and Confluent Stream Governance ensures that data moving in either direction remains governed, traceable, and compliant. AI models can be trained on well-understood, lineage-tracked data, while operational teams can trust that AI-generated insights maintain governance and explainability as they flow back into real-time applications. This eliminates the data silos and inconsistencies that slow AI adoption and ensures that AI can be deployed responsibly at scale.

With this foundation in place, application developers, data engineers, data analysts, and AI engineers can finally work from a single, real-time source of truth, reducing manual effort and accelerating AI adoption.

Where This Unlocks AI Innovation

By ensuring AI operates on real-time, trusted data, enterprises can move from reactive insights to proactive, automated decision-making with unprecedented speed and accuracy.

Traditionally, AI models have been trained on stale, batch-processed data, making them slow to adapt to changing conditions. AI-driven insights have often required manual intervention to be deployed back into production systems, creating delays and inefficiencies. This integration removes those barriers by ensuring:

  • Faster decision-making: AI models receive continuous real-time data rather than waiting for batch processing, reducing time-to-action from hours or days to milliseconds.

  • Higher accuracy: AI models can adjust dynamically based on the latest information, leading to better insights from both AI and traditional analytics or data science.

  • Seamless automation: AI-driven insights flow back into operational systems without manual work, allowing businesses to automate responses instantly instead of relying on human-driven processes.

Some key AI-powered capabilities enabled by this integration include:

  • Anomaly detection: Spot fraud, cybersecurity threats, or equipment failures the moment they occur, instead of after batch processing.

  • Predictive analytics: Forecast supply chain risks, customer demand, and operational bottlenecks with continuously updated data instead of static historical trends.

  • Hyper-personalization: AI-driven recommendations and customer interactions adapt dynamically in real-time, instead of being based on outdated preferences.

This isn’t just about moving data faster. It’s about ensuring AI operates in real-time, continuously improving its accuracy and impact, and directly influencing business outcomes without human bottlenecks.

What’s Next?

We’ve built deep product integrations that bring this vision to life and we’ll be rolling these out in the months ahead:

  • Flow Direction 1: Tableflow to Unity Catalog integration, enabling real-time operational data to flow directly into Delta Lake for AI and analytics, with governance, lineage, and security linked from Confluent Tableflow to Unity Catalog.

  • Flow Direction 2: Unity Catalog to Tableflow integration, delivering the ability to drive AI and analytic insights back into applications via Tableflow, with governance integration applying in this direction as well.

To dive deeper, our product and engineering teams will publish a series of blogs in the coming weeks that will cover:

  • How Tableflow and Delta Lake work together for real-time analytics.

  • How metadata governance across Unity Catalog and Stream Governance ensures trust and security.

  • Real-world implementation examples for enterprise AI.

For enterprises looking to fully operationalize AI, this partnership provides the missing link - ensuring AI is always working with real-time, trusted data.

We look forward to seeing what you build.

  • Jay Kreps is the CEO and co-founder of Confluent, the foundational platform for data in motion built on Apache Kafka. As a pioneer in a new category of data infrastructure, Confluent’s significant growth underscores the importance of data in motion across all industries. Prior to Confluent he was the lead architect for data and infrastructure at LinkedIn. He is the initial developer of several open source projects, including Apache Kafka.

  • Ali Ghodsi is the CEO and co–founder of Databricks, responsible for the growth and international expansion of the company. He previously served as the VP of Engineering and Product Management before taking the role of CEO in January 2016. In addition to his work at Databricks, Ali serves as an adjunct professor at UC Berkeley and is on the board at UC Berkeley’s RiseLab. Ali was one of the creators of open source project, Apache Spark, and ideas from his academic research in the areas of resource management and scheduling and data caching have been applied to Apache Mesos and Apache Hadoop.

¿Te ha gustado esta publicación? Compártela ahora