[Webinar] How to Protect Sensitive Data with CSFLE | Register Today

Connect Data From Legacy Databases Anywhere to MongoDB in Real Time With Confluent and Apache Kafka®

Written By

Businesses are generating more data than ever on a daily basis. As a result, many enterprises are undergoing a digital transformation that centers on their ability to contextualize and harness the value of their data in real time.

Traditionally, data has been stored in legacy databases that were designed for static, on-prem installations and were not purpose-built for the cloud; those databases are slow and rigid, which often results in high upfront and ongoing maintenance costs. In addition, traditional data pipelines that coordinate the movement and transformation of data were built around a batch-based approach, making them less suitable for powering real-time use cases. As a result, it’s very difficult to access real-time data assets generated by different users and applications across the enterprise—impeding organizations from transforming and modernizing their business.

Today’s organizations need scalable, cloud-native databases like MongoDB Atlas for improved agility, elasticity, and cost-efficiency. With MongoDB, developers can focus on building new, real-time applications with ease instead of spending time and resources patching, maintaining, and managing their database.

Implementing Apache Kafka has become the de facto standard for data streaming, which includes streaming data to legacy and cloud-based databases. But trying to get there on your own with OSS Kafka has significant challenges, including excessive operational burden and resource investment, limited security and governance, complex configuration for real-time connectivity and processing, and difficulty scaling to reach global availability. Many organizations are turning to Confluent Cloud, which enables them to connect, stream, and process events in the cloud, across multiple clouds, and in hybrid/on-prem environments.

With Confluent Cloud’s fully managed service, organizations can completely offload Kafka infrastructure management, operation burdens, and risks. As a result, development teams can redirect their focus to building mission-critical applications that differentiate their businesses. Confluent delivers a modern approach to breaking down data silos, enabling organizations to fully govern their real-time data flows and transform them in real time with stream processing. This allows different teams and applications (including MongoDB) to leverage real-time data to power modern applications. With Confluent, enterprises now have a way to connect legacy databases to MongoDB running on AWS, Google Cloud, or Microsoft Azure to accelerate database modernization initiatives.

Streaming pipelines, serverless Kafka, database modernization — How does it all come together?

All applications rely on and revolve around data. More often than not, applications require access to multiple sources of data, which can range from legacy databases to a wide variety of modern databases.

As an example, let’s talk about a financial services use case. Imagine there’s a fintech company, Acela Loans, and they recently acquired a small bank that uses Oracle Database to store sensitive customer data such as names, emails, and home addresses. The small bank also uses RabbitMQ as the message broker to store credit card transaction events. Acela Loans wants to perform real-time credit card fraud detection analysis to protect their newly acquired customers. This use case involves pushing suspicious activity flags to MongoDB Atlas, Acela Loans’ modern, cloud-native database that powers in-app notifications for their mobile application.

Acela Loans faces two challenges. First, their customer and transaction data is siloed between OracleDB and RabbitMQ, so they’ll need to access and merge that data in order to get a unified view of their customers’ credit card activities. Second, they’ll need stream processing to build a real-time application for fraud detection.

By using Confluent Cloud, Acela Loans is able to build and scale real-time hybrid and multicloud data pipelines that can move data from any source environment to the cloud database of their choice. This is accomplished through the use of Confluent’s extensive portfolio of fully managed source and sink connectors. In this scenario, Acela Loans would use the Oracle CDC Source Premium Connector and the RabbitMQ Source Connector to stream data into Confluent Cloud in real time.

Once all the data has been transferred to Confluent Cloud, Acela Loans is able to merge the data sources and build fraud detection using aggregates and windowing via ksqlDB stream processing. This allows them to create a list of customers whose accounts and credit cards may have been compromised. Finally, Acela Loans is able to stream the results into MongoDB Atlas using the MongoDB Kafka Connector. From here, they can alert customers of potentially fraudulent transactions, provide real-time charts via Atlas Charts, and store fraud history using MongoDB Time Series Collections.

Confluent offers 120+ pre-built connectors, which allow businesses to easily integrate Apache Kafka with other apps and data systems—no new code required. With this extensive suite of connectors, organizations can speed up production cycles and modernize their entire data architecture on any scale.

Benefits of modernizing your database with Confluent and MongoDB

Simplify and accelerate migration 

Stop creating tech debt and bespoke point-to-point solutions by integrating data and applications across different environments. With Confluent, you can easily link between on-prem and cloud, allowing you to streamline data movement across environments and process data in flight with ksqlDB stream processing.

Stay synchronized in real time

Don’t rely on out-of-sync, siloed legacy systems that limit your app development. Move from batch to real-time streaming and access change data capture technology using Confluent and our CDC connectors to ensure your legacy database is in sync with your cloud database of choice.

Easily scale to meet your needs

Many streaming solutions are limited in the amount of throughput they can handle. This can lead to frequent outages and not enough persistence to process data streams in real time. From its inception, Confluent was designed to be horizontally scalable to meet massive throughput requirements while maintaining ultra-low latency.

Reduce total cost of ownership

It’s very time-consuming to provision and configure open-source Kafka clusters. Add in cumbersome capacity planning and time wasted waiting on operators to deploy new platform tooling, and you end up with a delayed time-to-value and high TCO of self-supporting Kafka. The end result of this approach is often a lower ROI on data-in-motion projects.

Self-managing a Kafka environment can be costly. However, according to the Forrester TEI report, using Confluent Cloud allowed organizations to avoid infrastructure management and Kafka operation burdens and risks. This resulted in development and operations cost savings of over $1.4M, and scalability and infrastructure cost savings of over $1.1M. Overall, these savings led to a 257% return on investment with a payback period of fewer than six months.

Start your modernization journey today with Confluent and MongoDB

A new generation of technologies is needed to consume and extract value from today’s real-time, fast-moving data sources. By moving to real-time streaming with Confluent—and connecting data from any environment across cloud and on-prem to MongoDB Atlas—organizations can provide application teams with the data they need to efficiently deliver new features and capabilities to customers at scale.

Get hands-on with this demo to learn how you can migrate legacy data to MongoDB Atlas using Confluent, our ecosystem of pre-built connectors, and ksqlDB for real-time stream processing.

  • Lydia Huang is a Sr. Product Marketing Manager at Confluent, working with RSI partners to create joint GTM strategies. Prior to Confluent, she worked with product marketing at IBM, managing cloud security.

Did you like this blog post? Share it now