Introducing Confluent Private Cloud: Cloud-Level Agility for Your Private Infrastructure | Learn More

Presentation

Dynamic Change Data Capture with Flink CDC and Consistent Hashing

« Current 2023

Change Data Capture (CDC) is a popular technique for extracting data from databases in realtime. However, many CDC deployments are static: e.g. a single connector is configured to ingest data for one or several tables.

At Goldsky, we needed a way to configure CDC for a large Postgres database dynamically: the list of tables to ingest is driven by customer-facing features and is constantly changing.

We started using Flink CDC connectors built on top of the Debezium project, but we immediately faced many challenges caused mainly by the lack of incremental snapshotting.

But even after implementing incremental snapshotting ourselves, we still faced an issue around using replication slots in Postgres: we couldn't use a single connector to ingest all tables (it's just too much data), and we couldn't create a new connector for every new set of tables (we'd quickly run out of replication slots). So we needed to find a way to maintain a fixed number of replication slots for a dynamic list of tables.

In the end, we chose a consistent hashing algorithm to distribute the list of tables across multiple Flink jobs. The jobs also required some customizations to support the incremental snapshotting semantics from Flink CDC.

We learned a lot about Debezium, Flink CDC and Postgres replication, and we're excited to share our learnings with the community!

Presenter

Xiao Meng

Goldsky

Xiao Meng is a software engineer with a strong interest in data infrastructure, stream processing and SRE.

Currently, Xiao is working at Goldsky on building real-time data infrastructure for Web 3. Before that, Xiao worked as an Expert Data Engineer at Activision/Demonware, where he built a real-time game telemetry data platform for online games such as Call of Duty.

Presenter

Yaroslav Tkachenko

Irontools

As a tech lead in the data engineering space since 2018, Yaroslav Tkachenko has built large-scale data platforms and streaming data pipelines for some of the biggest names in their industries. His work has powered the iconic Call of Duty games at Activision and shaped e-commerce experiences at Shopify. His expertise spans the entire stack, from running Apache Kafka on bare metal to building intuitive dashboards and UIs. In 2025, Yaroslav embarked on a solo journey, founding his own company called Irontools. Listeners can follow his progress and gain insights from his newsletter, the "Data Streaming Journey."

Dynamic Change Data Capture with Flink CDC and Consistent Hashing

Presenter

Xiao Meng

Presenter

Yaroslav Tkachenko

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how