Prädiktives maschinelles Lernen entwickeln, mit Flink | Workshop am 18. Dezember | Jetzt registrieren

Secure Stream Processing with the Streams API in Kafka

Verfasst von

This blog post is the second in a series about the Streams API of Apache Kafka, the new stream processing library of the Apache Kafka project, which was introduced in Kafka v0.10.

Current blog posts in the Streams API in Kafka series:

  1. Elastic Scaling in the Streams API in Kafka
  2. Secure Stream Processing with the Streams API in Kafka (this post)
  3. Data Reprocessing with the Streams API in Kafka: Resetting a Streams Application

In this post we describe the security features of the Streams API in Kafka.  Many use cases and applications — whether it is in the area of stream processing or elsewhere — have tight internal and/or external security requirements. Industries where such requirements are common include finance and healthcare in the private sector but also include governmental services.  Legal compliance, for example, may require you to implement certain security measures such as encrypting data-in-transit when you are working on sensitive data such as personally identifiable information. You may also be required to enforce authentication and authorization to limit access to data to only a subset of employees or personnel to adhere to information security policies such as Need-to-know.

The question we want to answer in this blog post is:

  • Question (secure stream processing): If you are working in sensitive and regulated environments, what are your security options with regards to stream processing, notably when you are using a. Apache Kafka as the foundation of your data infrastructure and b. the Streams API as the means to process the data in Apache Kafka?

Let’s start with a quick answer to this question:

  • Answer (secure stream processing): Apache Kafka ships with a range of security features including but not limited to client authentication, client authorization, and data encryption.  These security features help you to implement your security policies and to protect your valuable data against internal and external threats.  On the side of processing the data in Kafka, your best option is to use the Streams API in Kafka to build your stream processing applications because it integrates natively with Kafka’s security features.

We can now walk through this answer in further detail.

First, which security features are available in Apache Kafka, and thus in the Streams API?  The Streams API in Kafka supports all the client-side security features in Apache Kafka.  In this short blog post we cannot cover these client-side security features in full detail, so I recommend reading the Kafka Security chapter in the Confluent Platform documentation and our previous blog post Apache Kafka Security 101 to familiarize yourself with the security features that are currently available in Apache Kafka.

That said, let me highlight a couple of important Kafka security features that are essential for implementing robust data infrastructures, whether these are used for building horizontal services at larger companies, for multi-tenant infrastructures (e.g. microservices), or for shared platforms such as in the Internet of Things.  Later on I will then demonstrate an example application where we use some of these security features in the Streams API in Kafka.

Kafka security features include:

  1. Encrypting data-in-transit between the servers of a Kafka cluster:  You can enable the encryption of broker-to-broker communication.  Brokers communicate with each other, for example, to replicate data for fault-tolerance.
  2. Encrypting data-in-transit between Kafka servers and Kafka clients: You can enable the encryption of the client-server communication between the Kafka servers/brokers and Kafka clients.  Kafka clients include stream processing applications built using the Streams API in Kafka library.
    • Example:  You can configure your Streams API applications to always use encryption when reading data from Kafka and when writing data to Kafka; this is very important when reading/writing data across security domains (e.g. internal network vs. public Internet or partner network).
  3. Client authentication: You can enable client authentication for connections from Kafka clients (including the Streams API in Kafka) to Kafka brokers/servers.
    • Example:  You can define that only some specific Streams API applications are allowed to connect to your production Kafka cluster.
  4. Client authorization: You can enable client authorization of read/write operations by Kafka clients.
    • Example:  You can define that only some specific Streams API application are allowed to read from a Kafka topic that stores sensitive data.  Similarly, you can restrict write access to certain Kafka topics to only a few stream processing applications to prevent e.g. data pollution or fraudulent activities.

It’s worth noting that the aforementioned security features in Apache Kafka are optional, and it is up to you to decide whether to enable or disable any of them.  And you can mix and match these security features as needed: both secured and non-secured Kafka clusters are supported, as well as a mix of authenticated, unauthenticated, encrypted and non-encrypted clients.  This flexibility allows you to model the security functionality in Kafka to match your specific needs, and to make effective cost vs. benefit (read: security vs. convenience/agility) tradeoffs: tighter security requirements in places where security matters (e.g. production), and relaxed requirements in other situations (e.g. development, testing).

Second, how do you use these security features in the Streams API i.e. when building your own stream processing applications?  The most important aspect to understand is that the Streams API leverages the standard Kafka producer and consumer clients behind the scenes.  Hence what you need to do to secure your stream processing applications is to configure the appropriate security settings of the corresponding Kafka producer/consumer clients.  Once you know which client-side security features you want to use, you simply need to include the corresponding settings in the configuration of your Streams API application.

Let’s show a simple example, based our previous blog post Apache Kafka Security 101.  What we want to do is to configure our Streams API application to 1. encrypt data-in-transit when communicating with its target Kafka cluster and 2. enable client authentication.

Tip: A complete demo application including step-by-step instructions is available at SecureKafkaStreamsExample.java under https://github.com/confluentinc/examples.

For the sake of brevity, we assume that a. the security setup of the Kafka brokers in the cluster is already completed and b. the necessary SSL certificates are available to your Streams API in Kafka application in the filesystem locations specified below (the aforementioned blog post walks you through the steps to generate them); for example, if you are using Docker to containerize your Streams API in Kafka applications, then you must also include these SSL certificates in the right locations within the Docker image.

Once these two assumptions are met, you must only configure the corresponding settings for the Kafka clients in your Streams API application.  The configuration snippet below shows the settings to enable client authentication and enable SSL encryption for data-in-transit between your Streams API application and the Kafka cluster it is reading from and writing to:

Within a Streams API application, you’d use code such as the following to configure these settings in your StreamsConfig instance:

With these settings in place your Streams API in Kafka application will encrypt any data-in-transit that is being read from or written to Kafka, and it will also authenticate itself against the Kafka brokers that it is talking to.  (Note that this simple example does not cover client authorization.)

Now what would happen if you misconfigured the security settings in your Kafka Streams application?  In this case, the application would fail at runtime, right after you started it.  For example, if you entered an incorrect password for the ssl.keystore.password setting, then the following error messages would be logged, and after that the application would terminate:

Similar exceptions would be thrown if you misconfigured other security settings such as ssl.key.password:

Your Operations team can monitor the log files of your Streams API applications for such error messages to spot any misconfigured applications quickly, and to alert the corresponding teams.

In summary, the Streams API in Kafka makes your stream processing applications secure, and it achieves this through its native integration with Apache Kafka’s security functionality.

If you have enjoyed this article, you might want to continue with the following resources to learn more about Apache Kafka’s Streams API:

And if you are interested in further information on Kafka’s security features, I’d recommend to read Apache Kafka Security 101.
  • Michael is a former principal technologist in the Office of the CTO at Confluent, the company founded by the original creators of Apache Kafka®. He focuses on longer-term product and technology strategy. Previously, Michael was the lead product manager for stream processing at Confluent, where his team created Kafka Streams and the streaming database ksqlDB. He is a well-known technology blogger in the big data community (www.michael-noll.com) and a committer/contributor to open source projects such as Apache Storm and Apache Kafka.

Ist dieser Blog-Beitrag interessant? Jetzt teilen