Verifying Apache Kafka-Based Data Pipelines

« Current 2022

Trading activity is at an all time high in the financial markets. In order to handle increasing volumes/velocity of data in real time to meet clients' expectations of capturing market moves with minimal latency, we recently migrated our financial data generation pipelines from batch to stream processing. It is important to have Integration Tests in place to ensure that our system is behaving as intended and provides greater confidence in maintaining the system over time, as it can simulate potential issues that could arise in production. A single software or network glitch in processing critical financial information can lead to erroneous data and outages. Testing stream-based applications to avoid these glitches is challenging because data can come in from an array of sources and in a variety of formats like json, avro or others. In this talk, we aim to highlight the importance of integration testing, a critical verification method for stable and reliable large-scale distributed streaming applications. We will also provide a high level overview of our system, challenges faced in moving to a streaming infrastructure, explore alternatives to our proposed solution and talk through the lessons learned while working with TestContainers library to test our KafkaStreams application.

Presenter

Subhangi Agarwala

Bloomberg

Verifying Apache Kafka-Based Data Pipelines

Presenter

Subhangi Agarwala

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how