[Webinar] How to Protect Sensitive Data with CSFLE | Register Today
The principle of least privilege dictates that each user and application will have the minimal privileges required to do their job. When applied to Apache Kafka® and its Streams API, it usually means that each team and application will have read and write access only to a selected few relevant topics.
Organizations need to balance developer velocity and security, which means that each organization will likely have their own requirements and best practices for access control.
There are two simple patterns you can use to easily configure the right privileges for any Kafka Streams application—one provides tighter security, and the other is for more agile organizations. First, we’ll start with a bit of background on why configuring proper privileges for Kafka Streams applications was challenging in the past.
You are a developer, working on a new application for a large-ish company. You want this application to be real time, event driven and easy to deploy and scale on Kubernetes, so naturally you use Apache Kafka’s stream processing APIs.
In your company, the production Kafka brokers are managed by a central operations team. You don’t need their help to deploy the application, as your own team owns the application CI/CD pipeline, but they didn’t give your team privileges to create new topics on the cluster. So, you prepare a JIRA asking them to create the output topic for the application.
But wait, there are also internal topics. Every time you do a join or an aggregation, for example, a state store is created, and those state stores are backed by a changelog topic. These internal topics are created automatically by Kafka Streams, and they are named automatically, too. Does that mean we need to give admin privileges on the entire cluster to our Kafka Streams application? The answer is no.
Before Confluent Platform 5.0 and Apache Kafka 2.0, you’d basically have to figure out in advance what the topic names are going to be and then grant the relevant users privileges on each and every one of those topics. Discovering the topic names and granting privileges wasn’t rocket science, but it was definitely annoying, time consuming and error prone.
Being engineers, we looked for a way to automate and simplify this annoyance, and we found it!
If you are running Confluent Platform 5.0+ (or Apache Kafka 2.0), you now have the flexibility to define security setups for Kafka and stream processing applications that can go from very tightly controlled for better security to very open for increased developer agility. In the next section, we’ll show you how to do it.
With Confluent 5.0 and Apache Kafka 2.0, here’s what you can do:
Kafka Streams lets developers explicitly define the prefix for any internal topics that their apps uses. Then the DevOps team can use the new “wildcard ACL” feature (see KIP-290, where it is called prefixed ACLs) to grant the team or application the necessary read/write/create access on all topics with the prefix you chose.
There are two ways you can use this feature. One pattern provides more agility and development velocity, while the other provides tighter security and controls.
The Kafka admin gives the entire team their own prefix, and grants the team and their applications privileges to read, write and create new topics with this prefix.
This means that they don’t require additional privileges whenever they want to deploy a new application or create a new topic, and they can do most of their work independently.
For example, if you are on the team responsible for all fraud detection applications in the company, you can decide that all applications and topics will be prefixed with team.fraud.
Now, you can open the JIRA ticket to ask your DevOps team to give your team access to all topics and consumer groups starting with team.fraud. They would do the following to grant access:
kafka-acls --authorizer-properties zookeeper.connect=zkhost:2181 \ --add \ --allow-principal User:fraud-dev-team \ --operation All \ --topic 'team.fraud.' \ --group 'team.fraud.' \ --resource-pattern-type prefixed
After the DevOps team creates this ACL, the developers can successfully deploy any application and create any topic as long as the application.id and topic name start with team.fraud.
For example, one application might have the Kafka Streams application.id of team.fraud.app1, which you define in the app’s configuration properties for Kafka Streams:
Properties streamsConfiguration = new Properties(); streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "team.fraud.app1"); ... KafkaStreams streams = new KafkaStreams(builder.build(), streamsConfiguration);
Another application might have the ID team.fraud.mobile.payments, and so on.
As useful as it is to give a team their own namespace and let them own it, this pattern isn’t always possible.
Sometimes there are organizational constraints that require limiting access. In this case, you still want to give applications the ability to automatically create their internal topics, but you want to grant the privilege at the application level. The developer team will let the operations team know when they are about to deploy a new application, and discuss what the application will do and the expected load on the system.
After they go through an operational onboarding process, the necessary principals and grants will be created.
Let’s say that one of the applications is responsible for analyzing mobile payments. Configure its Kafka Streams application.id as follows:
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "mobile.payments.fraud");
This guarantees that all internal topics that are created automatically by a KStreams application and its consumer group will be prefixed with mobile.payments.fraud.
Now you can open the JIRA and ask your DevOps team to give your application user read/write/create access to all topics that start with mobile.payments.fraud. They would do it like this:
kafka-acls --authorizer-properties zookeeper.connect=zkhost:2181 \ --add \ --allow-principal User:mobile-payments-fraud-app \ --operation All \ --topic 'mobile.payments.fraud' \ --group 'mobile.payments.fraud' \ --resource-pattern-type prefixed
This will allow the application to create its internal topics as it is running in production.
Absolutely.
Both patterns apply to KSQL applications as well. There is just one minor difference. In KSQL, you don’t configure application.id, you configure ksql.service.id.
Pattern 1: If your fraud detection team develops KSQL applications, you’ll want to prefix the ksql.service.id with team.fraud in the ksql-server.properties file on all the KSQL servers deployed by the team (example ksql.service.id: team.fraud.ksql-app1).
This means that the name of every internal topic created by those KSQL servers will be prefixed with _confluent-ksql-team.fraud, and the operations team can then create the corresponding prefixed ACLs for the team’s KSQL applications using:
kafka-acls --authorizer-properties zookeeper.connect=zkhost:2181 \ --add \ --allow-principal User:fraud-dev-team \ --operation All \ --topic '_confluent-ksql-team.fraud' \ --group '_confluent-ksql-team.fraud' \ --resource-pattern-type prefixed
Note that this ACL has different topic and group prefixes than the one used for Kafka Streams applications, so if your team has both KSQL and Kafka Streams applications, you’ll want to ask for both ACLs.
Pattern 2: Similarly, if your mobile payments analysis application is implemented using KSQL, you’ll configure ksql.service.id=mobile.payments.fraud in the ksql-server.properties file. The prefix for the internal topics and consumer groups will be _confluent-ksql-mobile.payments.fraud, and the CLI for the ACL configuration will be:
kafka-acls --authorizer-properties zookeeper.connect=zkhost:2181 \ --add \ --allow-principal User:mobile-payments-fraud-app \ --operation All \ --topic '_confluent-ksql.mobile.payments.fraud' \ --group '_confluent-ksql.mobile.payments.fraud' \ --resource-pattern-type prefixed
Output topics and intermediate topics (i.e., those created with to() and through() in DSL, respectively) are named by the developer explicitly, so you can call them whatever you choose, ask your DevOps team to create them for your application and give the latter the necessary read/write privileges via ACLs.
Now that you know how to easily configure privileges for Kafka Streams and KSQL applications, we hope you’ll find it easier and safer to take your code to production. Don’t forget to let us know if you did via Confluent Community Slack. We are always curious how our APIs are used in production.
Of course, there is more to security than just configuring ACLs, which you read more about in our Kafka Streams and KSQL documentation.
If you haven’t already, you can download the latest version of Confluent Platform to get started with the security features discussed in this blog post.
Building a headless data architecture requires us to identify the work we’re already doing deep inside our data analytics plane, and shift it to the left. Learn the specifics in this blog.
A headless data architecture means no longer having to coordinate multiple copies of data, and being free to use whatever processing or query engine is most suitable for the job. This blog details how it works.