[Webinar] AI-Powered Innovation with Confluent & Microsoft Azure | Register Now

Jan 28, 2025Read Time: 4 min

Introducing Real-Time Embeddings: Any Model, Any Vector Database—No Code Needed

Written By

Mayank JunejaSenior Product Manager

Jan 28, 2025Read Time: 4 min

We’re excited to announce the Create Embeddings Action, the newest member of Confluent Cloud for Apache Flink® Actions suite and the first step in our journey toward no-code AI workflows.

This no-code feature enables you to generate vector embeddings in real time, from any model, to any vector database, across any cloud platform. It’s the easiest way to create a live semantic layer for your AI workflows—ensuring fresher, more accurate data for better-performing systems. Under the hood, it leverages industry-leading data streaming technologies like Apache Kafka® and Apache Flink—battle-tested tools renowned for powering and scaling real-time systems across a wide range of industries. Let’s explore why this matters.

Why a live semantic layer matters

In modern AI systems like retrieval-augmented generation (RAG), embeddings are the glue that binds raw data to intelligent outcomes. These vectorized representations encapsulate the meaning of data, enabling semantic search and providing relevant context to large language models (LLMs).

The challenge lies in embedding creation often being a manual, batch process, which leads to static and outdated data representations. But data doesn’t stand still, and neither should your AI systems. Real-time embeddings let you build a live semantic layer, so your AI workflows always operate on the freshest, most relevant information.

By transitioning from batch to real-time embeddings, you unlock:

Improved AI precision: Fresh semantic data keeps your outputs precise and grounded.
Faster decision-making: Real-time embeddings eliminate the lag between data changes and actionable insights.
Future-proof systems: Stay agile as data volumes and use cases grow.
Efficient and predictable resource utilization: Smooth GPU utilization avoids the spikes of batch processing.

Any cloud, any model: Ultimate flexibility

We know AI systems aren’t one-size-fits-all. The rapid innovation in the model development space means that you should have flexibility to evaluate multiple models and choose the best for the specific use case. That’s why the Create Embeddings Action is designed to for:

Any cloud: Whether you’re on AWS, Google Cloud, Azure, or hybrid, Confluent ensures seamless integration with your infrastructure.
Any model: Use any embedding model that suits your needs—whether it’s from OpenAI, Amazon, or Google Gemini. You can even use your own fine-tuned model hosted on AWS SageMaker, Azure ML, or Vertex AI.

This flexibility allows you to build AI systems that adapt to your unique business challenges, all while taking advantage of Confluent’s multi-model and robust real-time capabilities.

How the Create Embeddings Action works

Like other actions, the Create Embeddings Action can be accessed from the data portal by clicking on the topic that contains data that should be converted into embeddings.

This is the first Action to leverage the AI Model Inference feature of Confluent Cloud for Apache Flink, enabling seamless integration of remote endpoint inference into your data processing jobs.

User flow

Using the Create Embeddings Action is straightforward:

Select the Topic and Column. Choose the topic and the text column that contains the data you want to convert into embeddings.
Select or Create a Model. Pick an existing model from your catalog or register a new one directly within the interface.
Optional: Configure Chunking. Set up chunking if needed to break text into smaller, more manageable pieces. This step is optional but can improve the efficiency and accuracy of embeddings.
Confirm and Run. Hit "Confirm and Run," and the system takes care of the rest.

When you run the Create Embeddings Action, the following steps take place:

Model registration. If you haven’t selected an existing model, the system registers a new model object in the Flink catalog.
Destination topic creation. A new destination topic is created with an embeddings column of type ARRAY<FLOAT>. This will store the vectorized data.
Flink SQL generation. A Flink SQL statement is generated that calls the embedding model endpoint, processes the source data, and stores the resulting embeddings into the destination topic.
Optional chunking. If chunking is configured, input text is broken into smaller pieces before vectorization. A SQL system function is added to handle chunking, optimizing the embedding process.‎
‎

Connecting to a vector database

Once embeddings are stored in the destination topic, they can easily integrate with an external vector database of your choice. Confluent’s sink connectors support all major vector databases, including MongoDB, Pinecone, Elasticsearch, SingleStore, Weaviate, Milvus, Couchbase, Neo4j, Qdrant, and Azure Cognitive Search.

These integrations allow you to build a live semantic data layer for your AI systems. By combining Confluent Data Streaming Platform with a vector database, you can ensure your AI models are always working with up-to-date, relevant embeddings.

Get Started

For RAG use cases, this new Action ensures that vector databases are continuously updated. A chatbot can pull current inventory and shipping timelines to provide accurate product availability to customers, or an agent can access live flight schedules and weather updates to generate travel itineraries. In the case of GEP Worldwide, their chatbot uses streaming data to generate insights and flag risks in procurement and supply chain operations.

Creating embeddings is an important step in building RAG applications. To learn more, visit the docs page. Try out this feature by signing in or getting started free with Confluent Cloud.

Get Started Today

‎

Apache®, Apache Kafka®, Kafka®, Apache Flink®, Flink®, and the Kafka and Flink logos, are trademarks of the Apache Software Foundation.

Mayank is a Product Manager for Stream Processing at Confluent. He holds extensive experience of building and launching enterprise software products, with stints in VMware, Amazon, and growth-stage startups Livspace and Bidgely.

Mayank holds an MBA with a specialization in Artificial Intelligence from Northwestern University, and a Computer Science degree from BITS Pilani, India.

Did you like this blog post? Share it now

Why Is My Apache Flink® Job Not Producing Results?

Apr 16, 2025

An Apache Flink® job not producing results often indicates an issue with watermarks, which are necessary for handling out-of-order message processing in time-based aggregations from Kafka. Watermarks balance data loss and latency by defining how long to wait for late messages. Incorrectly setting...

Wade Waldron

The AI Silo Problem: How Data Streaming Can Unify Enterprise AI Agents

Apr 3, 2025

Salesforce has Agentforce, Google launched Agentspace, and Snowflake recently announced Cortex Agents. But there’s a problem: They don’t talk to each other…

Sean Falconer

Introducing Real-Time Embeddings: Any Model, Any Vector Database—No Code Needed

GenAI Hub

Get Started with Confluent Cloud

Written By

Why a live semantic layer matters

Any cloud, any model: Ultimate flexibility

How the Create Embeddings Action works

User flow

Connecting to a vector database

Get Started

GenAI Hub

Get Started with Confluent Cloud

Did you like this blog post? Share it now

Why Is My Apache Flink® Job Not Producing Results?

The AI Silo Problem: How Data Streaming Can Unify Enterprise AI Agents

Why a live semantic layer matters

Any cloud, any model: Ultimate flexibility

How the Create Embeddings Action works

User flow

Connecting to a vector database

Get Started

GenAI Hub

Get Started with Confluent Cloud

Did you like this blog post? Share it now

Subscribe to the Confluent blog

Why Is My Apache Flink® Job Not Producing Results?

The AI Silo Problem: How Data Streaming Can Unify Enterprise AI Agents