Streaming SQL for Data Engineers: The Next Big Thing?

« Current 2022

SQL is the lingua franca of data analysis, but should we use it more as data engineers?

Modern tools like dbt make it easier to express transformations in SQL, but streaming is more complicated than batch. Streaming pipelines usually require higher SLAs and many CI/CD and observability practices, so data engineers prefer to use familiar languages like Python, Java and Scala along with many useful frameworks and libraries. Can SQL replace that?

I was very skeptical when I first heard the idea of using SQL for writing somewhat complex stream-processing data application a few years ago. How do you unit test it? How do you version it?

Over the years, Spark SQL streaming, Flink SQL, ksqlDB and similar tools have matured, now they easily support complex stateful transformations. However, developer experience is still questionable: it’s easy to write a SQL statement, but how do you maintain it over the years as a long-running application?

In this presentation, I hope to share the discoveries I made over the years in this area, as well as working practices and patterns I’ve seen.

Presenter

Yaroslav Tkachenko

Goldsky

Yaroslav Tkachenko is a software engineer interested in distributed systems, microservices, data-intensive applications, modern cloud infrastructure, and DevOps practices.

Currently, Yaroslav is a Principal Software Engineer at Goldsky, focused on building a read layer for the blockchain data leveraging the power of stream-processing.

Before that, Yaroslav was a Staff Data Engineer at Shopify, working on building and supporting libraries, tools and services for Shopify's stream-processing use-cases. Previously, he was a Senior Software Engineer and later Software Architect at Activision, where he redesigned and rebuilt the data pipeline for Activision games like the Call of Duty franchise. Before joining Activision, Yaroslav held various leadership roles in multiple startups

Streaming SQL for Data Engineers: The Next Big Thing?

Presenter

Yaroslav Tkachenko

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how