[Webinar] Don’t Get Left Behind: Unlock the Secrets of Shifting Left | Register Now

Building High Throughput Apache Kafka Applications with Confluent and Provisioned Mode for AWS Lambda Event Source Mapping (ESM)

作成者 :

Overview

Confluent and AWS Lambda can be used to build scalable and real-time event-driven architectures (EDAs) that respond to specific business events. Confluent provides a streaming SaaS solution based on Apache Kafka® and built on Kora: The Cloud-Native Engine for Apache Kafka, allowing you to focus on building event-driven applications without operating the underlying infrastructure. AWS Lambda is a serverless compute service that provides a code execution framework that abstracts the need to provision, operate, and scale any of the underlying infrastructure.

To integrate Lambda functions with Kafka clusters running on Confluent, customers can use the connector patterns as described in this blog—Confluent’s fully managed AWS Lambda sink connector or AWS Lambda’s fully managed event source mapping (ESM). Customers can also use Amazon EventBridge Pipes, which allows customers to build point to point integrations between sources and targets, with support for advanced transformations and enrichment. In this blog, we will discuss how to use the recently launched Provisioned Mode for Lambda’s Kafka ESM to build high throughput Kafka applications. Examples of such applications include generative AI data pipelines, real-time fraud detection, real-time call center data analysis, etc. We will also exhibit a sample scenario to activate and test the Provisioned Mode for ESM, and outline best practices to build Kafka applications using Confluent and Lambda.

What’s new

By default, Lambda’s Kafka ESM runs in an On-Demand (OD) mode, where Lambda ESM allocates one event poller to start polling for messages in a Kafka topic. The ESM then evaluates the message backlog—using the Offset Lag—for all partitions in the topic, and auto-scales event pollers to process messages efficiently. Previously, the Kafka ESM did not offer controls to optimize the throughput for performance-sensitive Kafka workloads, which could lead to noticeable delays for real-time applications that see sudden spikes in traffic.

Provisioned Mode for Lambda ESM is a new feature that helps you control and optimize the throughput of your ESM, to achieve an enhanced throughput profile for performance-sensitive Kafka applications, particularly ones that see sudden spikes in traffic. This feature offers:

  • Controls to optimize throughput: You can now fine-tune the throughput of your Confluent-Lambda Kafka applications by configuring a minimum and maximum number of resources called “event pollers” for your ESM. An event poller (or a “poller”) represents a compute resource that underpins an ESM in the Provisioned Mode, and allocates up to 5 MB/s throughput.

  • Responsive auto-scaling: With Provisioned Mode, your Kafka ESM detects the increase in OffsetLag metric (that indicates the message backlog) for all partitions in your Confluent Kafka topic, and auto-scales event pollers in a responsive manner. During idle periods, your ESM automatically scales down to the minimum event pollers set by you.

  • Simplified networking experience and charges: Previously, you were required to configure AWS PrivateLink or NAT Gateway to enable Lambda to poll messages from Kafka clusters in your VPC and invoke Lambda functions. With Provisioned Mode, you are no longer required to configure PrivateLink or NAT Gateway, and hence do not incur the corresponding costs you incurred previously for these networking components. This approach also reduces overhead and improves the developer experience, allowing you to focus on building your Kafka applications rather than managing networking setup.

Building a high throughput generative AI data pipeline

To demonstrate how the Provisioned Mode for ESM improves throughput for your Kafka applications, the following example uses an embedding pipeline for a generative AI application. This application uses a Lambda function to publish a message to a Confluent topic, which is then enriched by Confluent’s offering of managed Apache Flink®. The enriched topic triggers a downstream Lambda function via the Provisioned Mode for Kafka ESM; the Lambda function then calls Amazon Bedrock’s embedding model using the enriched data. For reference on creating a Lambda function that reads an event and invokes an Amazon Bedrock embedding model, see this Serverless Land pattern.

Figure 1: Reference Generative AI application for testing performance profile

Activate Provisioned Mode for Kafka ESM

To activate Provisioned Mode for a new or existing Kafka ESM, you can configure the minimum event pollers, the maximum event pollers, or both for your ESM. The allowed values range from 1 to 200 for minimum event pollers, and from 1 to 2000 for maximum event pollers.

Note that you must configure at least one of the minimum or maximum event pollers to activate Provisioned Mode. When you configure only the minimum number of event pollers (Min-only), your ESM allocates this minimum quantity and can dynamically scale up to a maximum. This maximum is determined by the Offset Lag and is limited by either the number of partitions or the default maximum event pollers, whichever is lower. When you configure only the maximum number of event pollers (Max-only), your ESM starts with one minimum poller by default, and can scale up to the maximum event pollers or number of partitions, whichever is lower. When you configure both the minimum and maximum number of event pollers (Min and Max), your ESM can auto-scale between this range of minimum and maximum event pollers configured.

Activate using AWS CLI

You can activate Provisioned Mode for ESM during creation of a new ESM, or by updating an existing ESM in AWS CLI. Specify the –provisioned-poller-config parameter.

aws lambda create-event-source-mapping\
    --region <region-name>\
    --function-name <function-name>\
    --event-source-arn <event-source-arn>\
    --provisioned-poller-config '{"MinimumPollers":<number>,"MaximumPollers":<number>}'

Activate using AWS console

Figure 2: Activating Provisioned Mode for ESM in console

Performance comparison

To see the improvement in performance profile for our reference application—generative AI data pipeline—select a configuration where the producer (Confluent’s offering for managed Flink) writes 2 million messages, each with 1 KB payload size to the enriched Kafka topic—distributed evenly across 100 partitions. For this pipeline, use a batch size of 100, with function duration of 100ms, and set the starting position to TRIM_HORIZON to process from the beginning of the stream.

Note the baseline performance profile observed with the default On-Demand mode for Kafka ESM. Then observe the performance with these scenarios:

  • Scenario 1: Provisioned Mode enabled with default minimum event pollers (equal to 1)

  • Scenario 2: Provisioned Mode enabled with minimum event pollers set to 10

  • Scenario 3: Provisioned Mode enabled with minimum event pollers set to 100 (equal to number of partitions)

Baseline performance with Kafka ESM On-Demand mode

Without Provisioned Mode for ESM, Lambda takes 7 minutes to drain the message backlog of 2 million messages. This is the baseline performance to measure the improvement against.

Figure 3: Baseline performance without Provisioned Mode for ESM

Scenario 1: Provisioned Mode enabled with default minimum event pollers, and auto-scaling

With Provisioned Mode for ESM activated with default minimum event pollers (equal to 1), Lambda takes 3.5 minutes to drain the backlog of 2 million messages. This is 50% faster than the baseline performance, and demonstrates an improved auto-scaling behavior on the Provisioned Mode, relative to the On-Demand mode.

Figure 4: Performance profile with minimum event pollers set to 1

Scenario 2: Provisioned Mode enabled with minimum event pollers set to 10

Dial up the minimum event pollers to 10. With minimum event pollers set to 10, Lambda takes 2.5 minutes to drain the backlog of 2 million messages. This represents a 64.3% improvement relative to the baseline performance with On-Demand mode.

Figure 5: Performance profile with minimum event pollers set to 10

Scenario 3: Provisioned Mode enabled with minimum event pollers set to number of partitions (100)

To optimize the throughput further, dial the minimum event pollers to the number of partitions, which is 100 for this application. In this scenario, Lambda takes 1.5 minutes to drain the backlog of 2 million messages, a 78.6% improvement over the baseline.

Figure 6: Performance profile with minimum event pollers set to 100

The following table summarizes the performance improvement for the reference Generative AI application using Provisioned Mode for ESM.

ESM Mode

Time to process message backlog

Percentage improvement

On-Demand mode

7 minutes

Baseline

Scenario 1: default minimum pollers

3.5 minutes

50%

Scenario 2: minimum pollers = 10

2.5 minutes

64.29%

Scenario 3: minimum pollers =100

1.5 minutes

78.57%

Observability and pricing

You can observe the usage of event pollers by monitoring the ProvisionedPollers Amazon CloudWatch metric, which measures the number of event pollers that actively processed at least one event in the last 5-minute window.

The pricing for Provisioned Mode for ESM is based on the provisioned minimum event pollers and the number of event pollers consumed during automatic scaling. Provisioned Mode introduces a billing unit called Event Poller Unit (EPU). Each EPU supports up to 20 MB/s of throughput for event polling. The number of event pollers allocated on an EPU depends on the throughput consumed by each event poller. You pay for the number of EPUs used and the duration they run for, measured in Event Poller Unit hours. For details, refer to AWS Lambda pricing.

Considerations and best practices

The optimal configuration of minimum and maximum event pollers for your Kafka event source mapping (ESM) depends on your application’s performance requirements. We recommend that you start with the default minimum event pollers (i.e., 1) to baseline the performance profile, and adjust event pollers based on observed message processing patterns and your application’s performance requirements. For workloads with spiky traffic and strict performance needs, increase the minimum event pollers to handle sudden surges. You can fine-tune the minimum required event pollers by evaluating your desired throughput, your observed throughput—which depends on factors like the ingested messages per second and average payload size, and using the throughput capacity of one event poller (up to 5 MB/s) as reference. Note that to maintain ordered processing within a partition, Lambda caps the allocated maximum event pollers at the number of partitions in the topic.

Network configuration

For existing Kafka ESMs, update your network settings to remove PrivateLink VPC endpoints and associated permissions when you activate Provisioned Mode. For new Kafka ESMs that you create with Provisioned Mode activated, you do not need to create PrivateLink VPC endpoints.

Partitions

Messages belonging to a given partition are processed by one and only one event poller, to maintain ordered processing. As such, maximum Lambda function concurrency is equal to the number of Kafka partitions. This also means that setting the minimum event pollers to more than the total number of partitions will not yield any further improvements.

Minimum event pollers

As seen in the table and graphs above, increasing the minimum event pollers beyond a certain threshold may yield diminishing marginal returns. Experimentation will be required to find the right balance between processing/cost and how long a Lambda has an Offset Lag greater than 0. In the example above, setting the minimum event pollers to default value of 1 yields a 50% improvement in processing time, which improves to 64.3% when minimum event pollers are increased to 10, and 78.6% when minimum event pollers are increased to 100. Since increasing minimum event pollers may yield only incremental performance returns after a certain point, we recommend fine-tuning the minimum event pollers to achieve your desired optimal cost and performance profile.

Consumer group ID

When specifying an existing consumer group ID, keep in mind that (1) AWS Lambda will ignore the starting position parameter for your ESM as it will use the committed offset of the consumer group and (2) if there are other active event pollers within that consumer group, Lambda doesn't receive all the messages in the Kafka topic. In instances where you specify a consumer group ID and Confluent cannot find an existing consumer group, AWS Lambda configures your event source mapping with the specified starting position.

Errors

During function times out or after a returned error, Lambda retries the whole batch until processing succeeds or messages expire. As such, one must keep in mind to set appropriate timeouts per Lambda function. If a function encounters an unrecoverable error, however, the ESM is paused and the consumer stops processing messages.

Offset Lag metric

Offset Lag is the difference between the latest offset available in a Kafka topic partition and the offset that a consumer group has consumed. This value indicates how far behind the consumer group is from the latest available data. You can monitor your Lambda function’s Offset Lag metric in Confluent using this guide. Alternatively, you can monitor the Lambda functions OffsetLag metric in AWS CloudWatch.

Conclusion

Provisioned Mode for Lambda ESM allows you to build high throughput serverless applications using Confluent Kafka clusters and AWS Lambda functions, by allowing you to optimize the throughput for your Kafka ESMs. This provides a responsive auto-scaling behavior for Kafka applications that have stringent performance requirements and see unpredictable and spiky traffic. You can fine-tune your configured event pollers based on your requirements and monitor usage via CloudWatch metrics. Provisioned Mode also simplifies network configuration by removing the requirement to configure PrivateLink.

Confluent, with its suite of services including Confluent Cloud, Confluent Schema Registry, and REST API, utilizes the de facto streaming standard Apache Kafka along with the next-generation Kora engine to provide a solution that is cloud native, complete, and everywhere. Sign up for a free trial of Confluent Cloud in AWS Marketplace.

Apache®, Apache Kafka®, and Kafka® are registered trademarks of the Apache Software Foundation.

  • Braeden Quirante began his career as a software consultant where he worked on a wide array of technical solutions including web development, cloud architecture, microservices, automation, and data warehousing. Following these experiences, he joined Amazon Web Services as a partner solutions architect working with AWS partners in scaled motions such as go-to-market activities and partner differentiation programs. Braeden currently serves as a partner solutions engineer for Confluent and an AWS evangelist.

  • Tarun Rai Madan is a Principal Product Manager at Amazon Web Services (AWS). He specializes in serverless technologies and leads product strategy to help customers achieve accelerated business outcomes with event-driven applications, using services like AWS Lambda, AWS Step Functions, Apache Kafka, and Amazon SQS/SNS. Prior to AWS, he was an engineering leader in the semiconductor industry, and led development of high-performance processors for wireless, automotive, and data center applications.

このブログ記事は気に入りましたか?今すぐ共有