국내 No.1 에너지 IT기업 ‘해줌’의 컨플루언트 클라우드 도입 스토리 | 알아보고 등록하기

Optimizing Serverless Stream Processing with Confluent Freight Clusters and AWS Lambda

작성자:

Confluent has been instrumental in enabling customers from various industries to develop real-time stream processing solutions using Apache Kafka®. While many of these use cases demand low-latency and real-time processing, stream processing is also increasingly being utilized for ingesting logging and telemetry data. This type of data typically features a high ingest rate, but allows for a higher tolerance for end-to-end processing time.

The introduction of Freight clusters effectively addresses customers’ needs for this specific type of Kafka broker. In this blog, we will show you how to build cost-effective, high-throughput, serverless streaming applications using Confluent Cloud Freight clusters for data ingestion, and AWS Lambda functions for database lookups and customer notifications, by constructing an example e-commerce web application. AWS Lambda is a serverless compute service that abstracts the need to provision, operate, and scale any of the underlying infrastructure, and can be used with Confluent to build scalable and real-time, event-driven, Kafka applications.  Freight is also the ideal choice for this use case because it prioritizes maximum throughput at the lowest cost, and with minimal operational overhead.

Note that although we are focusing on Freight clusters in this blog,  the use of AWS Lambda for serverless and selective message processing can work with other cluster types too.

Introducing Freight Clusters—High-Throughput, Low-Cost Streaming With Confluent

Freight is a new type of Confluent Cloud cluster designed for high-throughput, relaxed latency workloads that are up to 90% cheaper than self-managed, open-source Apache Kafka®. You can find more details on the features of a Freight cluster in the Confluent documentation. A key aspect of a Freight cluster is that it is a fully serverless cluster, requiring no maintenance or provisioning parameters from the user. Similar to a Freight cluster, AWS Lambda is a serverless compute service that allows you to run code without having to provision or manage infrastructure, and offers built-in auto-scaling and throughput controls to optimize performance.

To integrate Lambda functions with Freight clusters running on Confluent, customers can use different connector patterns: Confluent’s fully managed or self-managed AWS Lambda sink connector, AWS Lambda’s fully managed event source mapping (ESM), or Amazon EventBridge Pipes. Using a serverless broker like a Freight cluster with a serverless compute client like AWS Lambda inherently delivers a highly performant and cost-effective processing profile for high-throughput Kafka applications.

How To Build a Streaming Data Pipeline for E-commerce

For an e-commerce retailer, understanding user behavior on an app or website is crucial for optimizing and personalizing customer experience. Customers navigate a site and engage in various actions, before deciding to add an item to a cart or make a purchase. These actions or events can often be unpredictable, ranging from clicking on product links and checking reviews, to conducting searches. All such actions generate vast streams of events as customers browse, search, and add items to their carts.

To seek customer feedback about experiences on an app or website, retailers can use the event streams generated by customer actions to identify customers, and send survey links via email addresses or mobile numbers. In this blog, we will focus on a survey system that captures event streams from customer actions on a retail website, and sends out survey links to customers via email and mobile.

While there are various ways to ingest such a stream of events, Confluent Cloud Freight clusters offer an easy way to ingest and process such high-volume data cost effectively. In our specific case, the app owner wants to ingest the data into a highly elastic Kafka cluster that can scale based on seasonal variations, as well as the end customers’ timezones—which can lead to spiky and unpredictable traffic patterns. In addition, the app owner wants a fully managed compute client that auto-scales in a highly responsive manner, to avoid maintenance overhead.

Freight clusters with AWS Lambda offer a great combination to address both of these requirements for ingesting and processing event data. Using VPC-connected Lambda functions, you can also access private database resources and perform queries to retrieve relevant metadata. The combination allows the Lambda service to connect using the private Elastic Network Interfaces (ENIs) used by the Freight clusters. Finally, Provisioned Mode for Lambda’s Kafka ESM allows you to additionally fine-tune the throughput of the poller layer that reads from the Kafka topic and invokes the Lambda function.

Solution Overview

The diagram below shows a high-level overview of the proposed solution. It utilizes a modern web framework, such as AngularJS or React, to capture web clicks and transmit events to a web API integrated within an e-commerce application. The e-commerce application can run in any compute service, based on customer requirements. Typically, this application would store the events in a database, such as PostgreSQL. For the purposes of this blog, assume that a dedicated service within the e-commerce application polls the database and publishes the events to a Kafka topic within the Freight cluster. The producer app can leverage private ENI-based connectivity provided by the Freight cluster, which maintains low cost, while ensuring that traffic does not traverse the public internet.

As the events are published to the topic, there can be multiple subscribers, such as a customer loyalty system, a fraud detection system, a recommendation engine, and a customer survey generation system. We will specifically focus on a survey system, which needs to process events related to its surveys. 

To achieve this, we use the event filtering capability in Lambda ESM, which allows you to capture events related to survey interactions. The filtered events are then delivered to the Lambda function, which performs a lookup on a customer database to retrieve the email or mobile number of the customer. After performing the lookup, the function code generates the survey link and sends it to the customer using SNS mobile push or a text message. 

Configuring a Lambda Function With a Confluent Freight Cluster

To configure the inbound Kafka topic as an event source for Lambda, we leverage Lambda’s Kafka event source mapping (ESM), a fully-managed AWS resource that detects the  increase in offset lag (indicating the message backlog) for all partitions in your Kafka topic in Confluent, and auto-scales event pollers in a responsive manner. Provisioned Mode for ESM gives you more precise control over throughput, allowing you to specify the minimum and maximum number of pollers, and then automatically scaling between these limits. This is particularly useful for Kafka workloads with unpredictable traffic spikes. Your ESM will automatically scale down to the minimum number of event pollers during idle periods.

The Lambda function is attached to a VPC, in order to connect to the Freight cluster with ENIs that are only accessible within the VPC. It also allows the function to access a private relational database. Lambda ESM polls the events and invokes the target function with the event payload. The function connects to  a private RDS instance and performs a lookup to get a customer’s email or phone number. It then invokes an API to generate the survey link, and sends it to the user via SNS text message or push notification. 

Walkthrough of the Implementation

Let's briefly review some of the key steps of the implementation:

  1. Provision a Freight cluster as per the instructions on the console. 

  2. Create a sample topic and call it web_clicks.

  3. Follow the steps outlined in AWS documentation to configure a Confluent Cloud cluster as an event source for a Lambda function:

    1. Configure the bootstrap url from the Freight cluster network settings.

    2. Configure the subnets based on the ENIs that were used for provisioning the Freight cluster. Follow the instructions from the private network setup.

    3. Configure a security group to allow the traffic from the Lambda ESM and the private ENIs. Allowing all traffic originating from all instances and ENIs attached to the same security group will ensure access to the broker from the Lambda service.

    4. Attach the Lambda function to the same VPC.

    5. Configure the filtering condition to select specific events from the topic. Here is a sample filtering condition to achieve this.

      ‎ 

      {
        "value": {
          "data": {
            "actionType": ["SURVEY_POPUP_CLICK"]
          }
        }
      }

  4. Provision an RDS instance in a private subnet within the same VPC as the Freight cluster.

  5. Create a sample table to store customer ids and phone numbers.

  6. For each event, use the customer id to get the phone number from the RDS database.

Best Practices

Freight clusters are designed for high throughput and relaxed latency workloads. For information on region availability and limits, please refer to the Freight clusters documentation. For best practices and considerations for using Provisioned Mode for Lambda’s Kafka ESM, refer to this blog.

Conclusion

In this blog, we walked you through a use case for an e-commerce retailer with a variable data ingestion pattern. We looked at how Freight clusters, with their auto-scaling capabilities, can be an ideal solution for this type of data. The solution provides better security using a private network interface, and is up to 90% cheaper than self-managing open source-Apache Kafka.  The streaming data is processed using AWS Lambda, which aligns well with the auto-scaling capabilities of the stream, since it has its own auto-scaling capabilities and no infrastructure provisioning. It also allows you to access private data sources along with stream filtering capabilities. The filtering capabilities of ESM help with lowering Lambda invocation costs. Overall, using Lambda ESM simplifies the processing of streaming data in a private secure environment.

Next Steps

We hope this guide has helped you get started building scalable, cost-effective, serverless stream processing applications using Confluent Freight clusters and AWS Lambda. If you have any questions, or want to learn more, be sure to explore our documentation, or reach out to our team for support.

Confluent, with its suite of services including Confluent Cloud and Confluent Schema Registry with REST API, utilizes the de facto streaming standard Apache Kafka, along with the next generation Kora engine, to provide a solution that is cloud native, complete, and everywhere. Sign up for a free trial of Confluent Cloud in AWS Marketplace.

‎ 

Apache®, Apache Kafka®, Kafka®, are trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by the Apache Software Foundation is implied by using these marks. All other trademarks are the property of their respective owners.

  • Mithun is a Senior Product Manager at Confluent. He focuses on the integration and optimization of native cloud data services into the Confluent ecosystem. He has a deep background in serverless stream processing technologies such as AWS Lambda and building event driven architectures using Apache Kafka. He has over twenty years of experience in the industry with a focus on application integration and data streaming technologies.

  • Tarun Rai Madan is a Principal Product Manager at Amazon Web Services (AWS). He specializes in serverless technologies and leads product strategy to help customers achieve accelerated business outcomes with event-driven applications, using services like AWS Lambda, AWS Step Functions, Apache Kafka, and Amazon SQS/SNS. Prior to AWS, he was an engineering leader in the semiconductor industry, and led development of high-performance processors for wireless, automotive, and data center applications.

이 블로그 게시물이 마음에 드셨나요? 지금 공유해 주세요.