Prädiktives maschinelles Lernen entwickeln, mit Flink | Workshop am 18. Dezember | Jetzt registrieren
Stepping into the world of Apache Kafka® can feel a bit daunting at first. I know this firsthand—while I have a background in real-time messaging systems, shifting into Kafka’s terminology and concepts seemed dense and complex. There’s a wealth of information out there, and it’s sometimes difficult to find the best (and, ideally, free) resources. Luckily, I work at Confluent, where we’ve built a huge library of educational content authored by some of the most well-known names in Kafka. There are loads of great Kafka resources out there—for full transparency, I’ve picked the top beginner resources from our library because these were the most helpful to me.
After completing the resources below, you should be equipped to start building your first Kafka application and do some real-time stream processing!
Tutorial: How do I get started building my first Kafka producer application?
Tutorial: How do I get started building my first Kafka consumer application?
Tutorial: How can I count the number of messages in a Kafka topic?
Tim Berglund breaks down the fundamentals of Kafka in a really digestible way. This set of YouTube videos is a great way to get started with basic Kafka concepts, hear common use case examples, and learn how to integrate Kafka into your environment with a concise combination of instructional videos and hands-on exercises. You can find the full playlist on YouTube.
Watching all of the modules is a great way to broaden your Kafka understanding, but I’ve found that modules three and four are great places to start:
Module three, Apache Kafka Fundamentals, gets right into the fundamentals with the basic concepts of Kafka. As Tim puts it in the course: “This is the Fundamentals of Apache Kafka within the course called Fundamentals of Apache Kafka. So we’re really going to get into the fundamentals here.”
Module four, How Kafka Works, dives a little deeper. It includes a code overview for a basic producer and consumer, as well as diagrams and overviews for concepts like partition leadership and replication, producer guarantees, delivery guarantees, idempotent producers, and more.
Now that you’ve developed a basic understanding of Kafka, it’s time to try it out. Module three of the Apache Kafka 101 course, Your First Kafka Application in 10 Minutes or Less, shows you how to create a basic Kafka application in only a few minutes.
After you’ve built your first application, be sure to explore the remaining modules of the course on Confluent Developer. They explain important concepts, including topics, partitioning, brokers, the idea behind replication, Kafka producers and consumers, Kafka Connect, Schema Registry, Kafka Streams, and ksqlDB. While the Apache Kafka Fundamentals playlist above already gave you high-level brush strokes, this course brings it all together nicely.
Kafka Streams is a Java API for processing streams on the data stored in Kafka. Kafka Streams introduces powerful, high-level abstractions that make it easier to implement relatively complex concepts, such as joins and aggregations, and deal with the challenges of exactly-once processing and out-of-order data.
The Kafka Streams 101 course introduces you to the basics and then builds on the more complex ideas gradually, which is important when you’re constructing a more involved application. It also covers how joins work, and how they help to manage the application’s state. Handling out-of-order data is not something I had experience with, so learning about time windows and how to handle out-of-order data was especially helpful.
I found the first hands-on exercise of the course really helpful for getting started with building a streaming application. You can hit the ground running just by cloning the course GitHub repo and logging in to the cloud-hosted environment. At the end of the exercise, you’ll have a running app with fully functioning basic stream processing operations, dynamically filtering the messages that met specific criteria (e.g., numbers that are larger than 1,000) using filter
and mapValues
.
When learning a new technology, I find that hands-on experience and repetitive practice work best, and following step-by-step tutorials is an effective way to achieve this. This was the first tutorial that guided me in creating a Kafka-based producer application that publishes messages to Kafka.
The tutorial offers two methods for running this application: hosting it in a cloud environment or building it using a basic Kafka setup. When starting out, I recommend opting for the cloud-based approach because it allows you to concentrate solely on the application rather than the underlying infrastructure. See a snippet from the tutorial in the image below.
Once I learned how to produce messages in a simple app, it was time to learn how to consume them. This tutorial helped me to do just that. In simple steps, it shows you how to build a small app that uses a KafkaConsumer
to read records from Kafka.
Like the producer app tutorial, this tutorial has steps for both cloud-based and standalone Kafka-based deployments. If you’re new to Kafka and want to focus on your app, I recommend using the cloud-based option. Check out a section of the cloud-based tutorial below.
One of the beauties of Kafka is its versatility—it works well with a variety of platforms and programming languages. I have a background in Java and JavaScript, but I often encounter Kafka users with experience in Python or .NET/C#. Regardless of your background, chances are you can apply your current skill set to data streaming.
These helpful language-focused tutorials walk you through how to build client applications that produce and consume messages from a Kafka cluster. Try out the tutorial for your preferred programming language(s):
For a final learning exercise, I highly recommend this message-counting tutorial. This is a simple but interesting example, where you take a topic of pageview data and learn how you can count all of the messages in the topic. This tutorial comes in three flavors: a cloud-based version, a ksqlDB-based version, and a standalone basic Kafka version. This was the tutorial that got me curious about ksqlDB. The opportunity to compare the different approaches is especially valuable for beginners. Below, see a step from the cloud-based tutorial where you learn to use kcat
to count messages.
I hope these resources helped you hit the ground running with Kafka! Our community Slack and forum are great places to ask questions and get help if you run into any issues while learning and building.
As always, your feedback is encouraged. Did you find these resources helpful? Are there other topics you’d like to see included? Let us know with the blue “feedback” button (or, the blue thumbs-up button for mobile readers).
✍️ Acknowledgment: I would like to extend a special thank you to my colleague, Vanessa Wang for helping me with this post. Her insightful feedback and diligent edits have greatly improved this piece.
The term “event” shows up in a lot of different Apache Kafka® arenas. There’s “event-driven design,” “event sourcing,” “designing events,” and “event streaming.” What is an event, and what is the difference between the role an event has to play in each of these contexts?
Learn the basics of what an Apache Kafka cluster is and how they work, from brokers to partitions, how they balance load, and how they handle replication, and leader and replica failures.