Level Up Your Kafka Skills in Just 5 Days | Join Season of Streaming On-Demand

Presentation

Streaming SQL for Data Engineers: The Next Big Thing?

« Current 2022

SQL is the lingua franca of data analysis, but should we use it more as data engineers?

Modern tools like dbt make it easier to express transformations in SQL, but streaming is more complicated than batch. Streaming pipelines usually require higher SLAs and many CI/CD and observability practices, so data engineers prefer to use familiar languages like Python, Java and Scala along with many useful frameworks and libraries. Can SQL replace that?

I was very skeptical when I first heard the idea of using SQL for writing somewhat complex stream-processing data application a few years ago. How do you unit test it? How do you version it?

Over the years, Spark SQL streaming, Flink SQL, ksqlDB and similar tools have matured, now they easily support complex stateful transformations. However, developer experience is still questionable: it’s easy to write a SQL statement, but how do you maintain it over the years as a long-running application?

In this presentation, I hope to share the discoveries I made over the years in this area, as well as working practices and patterns I’ve seen.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how