Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now
We covered so much at Current 2024, from the 138 breakout sessions, lightning talks, and meetups on the expo floor to what happened on the main stage. If you heard any snippets or saw quotes from the Day 2 keynote, then you already know what I told the room: We are all data streaming engineers now.
Why? Because of a historic paradigm shift that’s been in progress for more than a decade. The way we build systems and manage data has dramatically changed in that time, thanks in large part to the overlapping and widespread adoption of microservices, cloud computing, and Apache Kafka®. As we heard from Confluent CEO Jay Kreps on Day 1, that change continues now as companies are becoming not just software, but software-powered by artificial intelligence—and they’re doing it faster than their data infrastructure can match.
Building on the recent explosion of generative AI, many businesses are ambitiously working toward creating AI agents capable of acting and communicating on the behalf of human users by iteratively interrogating domain-specific language models. The more these complex use cases emerge and prove themselves to be useful, the more we’re going to see the responsibility for AI/ML engineering bleed across roles.
So it won’t just be AI/ML researchers and Ph.D. data scientists being tapped—software engineers will feel even more pressure to build AI-powered applications and services that are capable of making trustworthy, domain-informed decisions in real time. What started as a rarefied academic pursuit is now a commonplace task of the enterprise software developer.
The practical enterprise incarnation of AI is not just a large language model trained over weeks and months on an internet’s worth of text—it’s also a large set of model-encoded contextual data about what is currently true in the enterprise itself. It’s all but impossible to imagine keeping this context up to date without a data streaming platform at its foundation. And at the foundation of that platform is Apache Kafka. As Staff Developer Advocate Danica Fine put it in the Day 2 keynote (speaking of AI), "The backbone of any real-time system has to be Kafka.”
Check out sessions from Current 2024 on demand.
Paradigm shifts at the scale we’re seeing with streaming are rare. In fact, I’d argue that a shift of this magnitude in digital computing has only happened once before, when the mainframes gave way to what we called the client-server model beginning around 40 years ago.
To take advantage of this new era, practitioners and leaders alike need to understand what a data streaming engineer does. We need to know the skills, tools, and organizational capabilities we need to advance our careers and meet the rapidly evolving needs of a business hungry to deploy value-creating new tools.
The words get a little complicated here, because we don’t all agree on whether we want to be called software architects, software developers, software engineers, data engineers, or just plain programmers. We all have different specialties in the front end, the back end, application development, infrastructure, data, architecture, and more. But when you realize our work is based increasingly on the data streams that find their form in Kafka topics, and on the layers of the stack that are emerging on top of that substrate, you can see why the term “data streaming engineer” applies so broadly.
At Current, I got to converse with people in various stages of their journey as data streaming engineers. These interactions confirmed the obvious—that the ecosystem has coalesced around core technologies like Kafka and, more recently, Apache Flink®—but offered new insights into the problems we are focused on solving next.
While the Day 2 keynote may have been titled, “The Rise of the Data Streaming Engineer,” in reality, the dawn of this era is several years behind us already. We as a community are already working out the form of the emerging streaming platform. This is something Adi Polak ably demonstrated with her live demo that combined Kafka, Flink, Kafka Connect, Confluent Schema Registry, Apache Iceberg®, and Confluent Tableflow to illustrate how the shift-left pattern delivers higher-quality analytics at lower overall cost from streaming sources. You really have to check out the video—this pattern is going to be critical going forward.
And while you’re watching videos, let me highlight a few sessions for your review, if you didn’t catch them in person. Here are some beginner-level links to explore:
Introducing Apache Flink: 5 Things You Need to Know to Get Started
So You Want to Write a User-Defined Function (UDF) for Flink?
Timing Is Everything: Understanding Event-Time Processing in Flink SQL
If you’re further along in your journey, try these on for size:
Optimizing Apache Kafka With GraalVM: Faster, Leaner, and Performant
Change Data Capture & Kafka: How Slack Transitioned to CDC With Debezium & Kafka Connect
Using Kafka Streams to Build a Data Pipeline for the Hospitality Industry
Unlocking Real-Time Insights: Uber Freight's Evolution From Batch to Streaming Analytics
CDC Pipelines to the Data Lakehouse Using Apache Hudi and Apache Flink
Whether you were able to make Current this year or not, there’s plenty of opportunity for you to further your progress as a data streaming engineer. However works for you, just start learning and get involved with in-person meetups, online groups, and conferences as much as possible.
Don’t forget to visit Confluent Developer for tutorials, language-specific guides, and video courses—all for free. And I’m happy to inform you that the recordings for all Current 2024 sessions are available now. Nothing is stopping you from moving forward in data streaming except your own investments in yourself. Now is the time to make them!
第3四半期の Cloud Bundle ローンチは Current 2024 からお届けします。データストリーミング業界の専門家が一堂に会し、特に AI の時代においてデータストリーミングが重要である理由と、明日のビジネスを形作る上でデータストリーミングがさらに重要になる...
We’re excited to announce Early Access for Confluent for VS Code. This Visual Studio integration streamlines workflows, accelerates development, and enhances real-time data processing, all in a unified environment. This post shows how to get started, and also lists opportunities to get involved.