[Webinar] How to Protect Sensitive Data with CSFLE | Register Today

Project Metamorphosis Month 8: Complete Apache Kafka in Confluent Cloud

Écrit par

This is the eighth and final month of Project Metamorphosis: an initiative that brings the best characteristics of modern cloud-native data systems to the Apache Kafka® ecosystem, served from Confluent Cloud. So far, we’ve covered elastic, cost effective, infinite, everywhere, global, secure, and reliable—which describe a set of features that makes event streaming easier to use, deploy, scale, and operate. This month, we’re excited to unveil the final launch for 2020: Complete. Together with our previous announcements, Project Metamorphosis culminates in a complete event streaming platform that enables enterprises to implement mission-critical business use cases from start to finish, and with ease.

From a powerful engine to a complete product

Apache Kafka has become the digital central nervous system for the always-on world, where businesses are increasingly software-defined and automated, and where the user of software is other software. We only need to look back at the past few years to see that major trends like cloud computing, artificial intelligence, ubiquitous mobile devices, and the Internet of Things have caused an explosion of data. Companies have been struggling with keeping pace to collect, process, and act on all this information as quickly as possible—be it to serve their customers faster, to gain an edge on the competition, and so on. The result? Whether you shop online, make payments, order a meal, book a hotel, use electricity, or drive a car, it’s very likely that, in one form or another, it is powered by Kafka.

While the open source Apache Kafka project provides the technical foundation for collecting and processing real-time data streams as they occur, it doesn’t offer a simple solution for implementing business use cases fully end to end and within the context of a company’s existing infrastructure—similar to how even a very powerful engine is not sufficient to build a fully functional car. Instead, enterprises attempt to solve this by spending precious development resources writing custom code or piecemealing tactical toolsets to create some semblance of an event streaming architecture. This results in slowed innovation, delayed projects, and unnecessary complexity across the enterprise. So without a team of Kafka experts, the bar for getting to a central nervous system is too high for many companies. Let’s walk through how this month’s theme, Complete, addresses these concerns.

Complete means your entire business is interconnected

Imagine a newly built interstate highway. What’s the first thing you need to drive it? On-ramps and off-ramps! Otherwise, the highway is of limited use. If Kafka is the data highway, then you need the building blocks to connect existing systems to Kafka, so that relevant data from any single part of the business is available elsewhere in the organization in real time, too. This is crucial in a world driven by software and data. Without such functionality, you’d be left in a precarious situation where your left hand does not know what the right hand is doing. In retail, for example, this could mean a customer moved to a new address, but our fulfillment center still ships orders to the old address. Or, an online customer purchased the last available evening dress, even though this dress was already sold in the physical store a few minutes earlier.

To quickly integrate your entire IT ecosystem in order to avoid situations like these, you can leverage Confluent Hub. Confluent Hub provides a single marketplace with 120+ pre-built connectors for Kafka—effectively “real-timing” your existing databases and systems through streams of data. Example connectors that Confluent supports together with its partners include Amazon S3, MongoDB, Elasticsearch, MQTT, Salesforce, and Azure Data Lake. Here is one example of how easy it is to deploy these connectors using the Confluent CLI:

$ confluent-hub install mongodb/kafka-connect-mongodb:1.3.0

Many of these connectors are also available as fully managed services in Confluent Cloud, with usage-based billing and marketplace integrations on AWS, Azure, and Google Cloud. All operational aspects are taken care of by Confluent, as shown in the demo below:

Below are some of the recently released connectors that are now generally available:

On top of that, Cluster Linking lets you build truly global architectures, where data streams are flowing in real time between datacenters in different regions of the world, between on premises and the cloud (i.e., hybrid cloud setups), or between different clouds for multi-cloud setups. In summary, this comprehensive connectivity jumpstarts you on your event streaming journey.

Complete means you can turn your data into action

At this point, the central nervous system has lots of data flowing through it from all across the organization. The next step is to work all this data that you now have at your fingertips. Imagine how the body’s nervous system works: If something important happens, such as your hand touching a hot surface, then the brain needs to react to this event immediately and command the hand to pull away.

The same is what customers expect today: If the customer interacts with one part of the business, then the information around this event must also be available to all the other parts immediately. They must be ready to respond to the task at hand—whether this means reacting to the current interaction with the customer or remembering the information for additional context in any future interactions. A rich customer experience means that a single customer interaction will trigger not just one but many reactions in the business. This cascading response results in even more data that needs to be generated, shared, and consumed in a timely and consistent manner across the organization. How can you put all this information to good use without having to spend weeks or months on development efforts?

While Apache Kafka ships with the Kafka Streams library, which allows you to implement applications that process streams of events, Kafka Streams requires coding in Java or Scala. A much more compelling solution for many teams is ksqlDB, the distributed event streaming database purpose-built for stream processing applications.

Before we dive into the details, let’s kick this off with a ksqlDB demo that shows how easily you can build end-to-end event streaming applications with zero infrastructure using Confluent Cloud:

With ksqlDB, you can build applications that respond immediately to new events; filter, transform, and aggregate data, including windowed aggregations; create materialized views over streams that are continuously being updated with the latest information as soon as it occurs; and enrich and correlate streams and tables with data that arrives from various lines of businesses. Just like Kafka, ksqlDB is fault tolerant and supports exactly-once guarantees for all its processing capabilities, making it trustworthy for mission-critical use cases.

In banking, for example, you can improve real-time fraud detection by adding context to a new transaction. If a customer’s continuously updated profile tells you that the associated credit card has always been used to purchase from the Amazon U.S. store only, then a new transaction with a seller in the Amazon France marketplace is out of the ordinary. If a simultaneous surge of suspicious transactions also occurs elsewhere in the customer base (via data correlation), then the alarm bells will begin ringing. The ability to enrich data in real time by joining streams and tables, such as a join between the aforementioned stream of transactions and a table of user profiles, is at the heart of ksqlDB. Of course, you can also perform stream-stream or table-table joins, including multi-joins.

Applications are flexible in how they interact with the available data in ksqlDB. They can adopt the event streaming paradigm by issuing push queries to subscribe to real-time data updates (e.g., “traffic control just sent a notification that the customer’s flight is delayed”). Additionally, applications can use a request-response approach by issuing pull queries on demand, which function like a SELECT query in a traditional database: A pull query performs point-in-time lookups against the current state of materialized views (e.g., “what’s the current phone number in the customer profile so we can send a text message about the delayed flight?”).

In the previous section, we talked about the tremendous benefits of interconnecting various systems in an enterprise using ready-to-use Kafka connectors. What is more, ksqlDB can manage these connectors for you. It can either run these connectors itself as embedded connectors, or it can control an external Kafka Connect cluster. Imagine you have a PostgreSQL database for customer data that you want to connect to Kafka. Here’s how ksqlDB’s SQL syntax lets you define and configure an (embedded) JDBC source connector:

CREATE SOURCE CONNECTOR jdbc_source WITH (
  'connector.class' = 'io.confluent.connect.jdbc.JdbcSourceConnector',
  'connection.url' = 'jdbc:postgresql://postgres:5432/postgres',
  'connection.user' = 'postgres',
  'connection.password' = 'mySecret',
  'topic.prefix' = 'jdbc_',
  'table.whitelist' = 'customer_profiles', ...);

With this single line of SQL, the table customer_profiles and any subsequent changes to this table are now being ingested in real time as a stream of events into Kafka. This is possibly the easiest way to set up change data capture (CDC) for an existing database.

ksqlDB is included in Confluent Platform, but an even better experience is provided in Confluent Cloud where ksqlDB is available as a fully managed service. Spinning up a new ksqlDB environment takes only a few clicks, and then you can use all of the aforementioned functionality —plus more, such as Confluent Schema Registry integration for data governance or Role-Based Access Control for granular security settings—without having to worry about the operational side of running this infrastructure.

Event streaming, complete and ready to use

Event streaming is on the rise. It has already been adopted by more than 80% of the Fortune 100 and by thousands of companies, non-profits, and public administrations around the globe. Our goal at Confluent is to make event streaming easy, ubiquitous, and future proof. And it often takes more than just a powerful distributed system to put a central nervous system at the heart of an enterprise, even if that system is Apache Kafka.

As we have demonstrated with Project Metamorphosis, an event streaming platform in practice needs a wide range of capabilities: It requires security controls, elastic scaling, usage-based pricing with scale to zero, infinite storage with tiering to reduce storage costs, reliable operations with best-of-class support, global Kafka deployments for resilience and disaster recovery, the freedom to deploy everywhere you need, and—what we shared this month—a truly complete offering that brings together all the pieces. Confluent Cloud, with Apache Kafka at its core, provides all these functionalities as a fully managed service, available to you instantaneously at the click of a button. Event streaming has never been easier!

To get started, use the promo code CL60BLOG for an additional $60 of free Confluent Cloud usage.*

  • Michael is a former principal technologist in the Office of the CTO at Confluent, the company founded by the original creators of Apache Kafka®. He focuses on longer-term product and technology strategy. Previously, Michael was the lead product manager for stream processing at Confluent, where his team created Kafka Streams and the streaming database ksqlDB. He is a well-known technology blogger in the big data community (www.michael-noll.com) and a committer/contributor to open source projects such as Apache Storm and Apache Kafka.

  • Diby Malakar has more than 25 years of experience in the data management space and is currently the senior director of product management for Kafka Connect and data integration technologies at Confluent. He was previously the VP of product management at Oracle and has also worked at companies like SnapLogic and Informatica.

Avez-vous aimé cet article de blog ? Partagez-le !