[Webinar] Build Your GenAI Stack with Confluent and AWS | Register Now

Securing the Infrastructure of Confluent with HashiCorp Vault

Written By

In order for a technology like Confluent Cloud to make it easy to set data in motion, many different software systems are required to interact with each other using API keys or other secrets. Once we have such secrets, we need to store them somewhere secure. The industry has several secure secrets management systems available, one of which is Vault by HashiCorp.

In our research on which secure secrets management system to use for storage, we settled on HashiCorp Vault. There were a few things we learned, and some of them we feel would be useful to the technology community at large.

Below, we’ll cover how we settled on Hashicorp Vault, set it up, tested it out, and solved one use case that wasn’t well-documented in the existing literature. As a bonus, we’ll give you some of the code we use to configure Vault using Golang/REST.

Setting up using Banzai Cloud’s Vault operator

At Confluent, we typically prefer the “buy” side of the “build/buy” spectrum for technologies that are not our core competency. Therefore, we strongly considered outsourcing this task, but we had some Vault expertise in-house and had set up Vault previously using Banzai Cloud’s Bank-Vaults K8s operator for a small proof-of-concept Vault cluster in an existing Kubernetes container to use for some minor services.

It turned out to be surprisingly easy to manage and run ourselves, so we have not yet needed a third-party service as we expand our usage of Vault within Confluent. The Vault clusters, production and non-production, run in high availability (HA) mode on a set of Kubernetes clusters that we had already built for other purposes. The backend is an HA Postgres instance in Google’s infrastructure, which has all the built-in HA and disaster recovery characteristics taken care of. Thus, much of what typically makes running such an HA service difficult has already been outsourced to Google via Google Kubernetes Engine (GKE) and Google Cloud SQL.

HA testing using a load-test framework

After the clusters were up and running and serving some of our lower-volume applications, we ran some load tests against our non-production cluster and deleted nodes in the Kubernetes cluster to ensure that the failover was quick and impactless. For this, we used Slapper.

We wanted Slapper to run on the same Kubernetes cluster as the Vault clusters for maximum slapping potential.

As a proof of concept, we generated containers on a local Mac workstation using Docker containers. Once some minor load tests were run in that configuration, we regenerated the load test containers within pods on the Kubernetes cluster that housed our Vault cluster.

Below is the shell script for setting up the container for use on Kubernetes. Please note that it is very easy to modify this to run within Docker on your workstation, if that’s your preference.

The purpose of the load tests wasn’t so much to test load, as we could allocate more CPU or RAM to the Vault HA frontends to change the load characteristics of the cluster, but more to facilitate chaos testing, ensuring the load tests do not slow down. At some point we may rewrite (or just reuse) these to perform actual load testing when we feel we have hit a performance bottleneck within Vault itself.

Note that we assume in ~/Downloads that you have downloaded the Slapper master branch as a ZIP file before running this script.

# Script to set up Slapper Load Testing on k8s
# Copyright(c)2021 by Confluent, Inc
# This software is made available by Confluent, Inc., under the terms of the
# Confluent Community License Agreement, Version 1.0 located at
# http://www.confluent.io/confluent-community-license. BY INSTALLING,
# DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY OF THE
# SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE
# AGREEMENT.
#
# on Workstation……………………….
kubectl -n vaultunstable run timeless-loadtest --rm -i --tty --image ubuntu -- bash
# on Workstation in another shell...
kubectl -n vaultunstable get pods \
	| grep loadtest \
	| awk '{print "kubectl -n vaultunstable cp ~/Downloads/slapper-master.zip "$1":m.zip"}' \
	| bash
# on Pod as root.............................
apt update
apt install curl netcat-traditional unzip golang
unzip m.zip
mkdir -p /go/src
mv slapper-master go/src/slapper
export GOPATH=/go
cd /go/src/slapper
go build
# SET THESE FOR YOUR ENVIRONMENT! Use a currently-valid Vault token
export VAULT_ADDR=https://vaultunstable
export VAULT_TOKEN=...
# _____________________________________________________________
echo "GET https://vaultunstable/v1/v1/prod/kv/metadata?list=true
GET https://vaultunstable/v1/v1/prod/kv/data/timeless_test" > get-list-targets.txt
./slapper \
	-rate 300  \
	-minY 1ms -maxY 40ms  \
	-H "X-Vault-Token: ${VAULT_TOKEN}" \
	-H "X-Vault-Request: true"  \
   	-targets get-list-targets.txt
# _____________________________________________________________
# has access to do Vault commands
#role-id               b0bby-tabls-caddy-8999-deadbeef0beaf-fead
#secret_id             abcdef-1234-45789-8990-abcdef1234678
echo 'POST https://vaultunstable/v1/auth/approle/login
$ { "role_id": "b0bby-tabls-caddy-8999-deadbeef0beaf-fead", "secret_id": "abcdef-1234-45789-8990-abcdef1234678" }
' > auth-login-targets.txt
go run ./slapper.go \
	-rate 160  \
	-minY 2ms -maxY 90ms  \
	-H "X-Vault-Token: ${VAULT_TOKEN}" \
	-H "X-Vault-Request: true"   \
	-targets auth-login-targets.txt
# _____________________________________________________________

echo 'POST https://vaultunstable/v1/v1/prod/kv/data/timeless_test $ {"data": { "PROD_test_key_1": "PROD_test_value_1", "PROD_test_key_2": "PROD_test_value_2" }} POST https://vaultunstable/v1/v1/stag/kv/data/timeless_test $ {"data": { "STAG_test_key_1": "STAG_test_value_1", "STAG_test_key_2": "STAG_test_value_2" }} POST https://vaultunstable/v1/v1/devel/kv/data/timeless_test $ {"data": { "DEVEL_test_key_1": "DEVEL_test_value_1", "DEVEL_test_key_2": "DEVEL_test_value_2" }} POST https://vaultunstable/v1/v1/ci/kv/data/timeless_test $ {"data": { "CI_test_key_1": "CI_test_value_1", "CI_test_key_2": "CI_test_value_2" }} POST https://vaultunstable/v1/v1/common/kv/data/timeless_test $ {"data": { "COMMON_test_key_1": "COMMON_test_value_1", "COMMON_test_key_2": "COMMON_test_value_2" }} GET https://vaultunstable/v1/v1/prod/kv/data/timeless_test GET https://vaultunstable/v1/v1/stag/kv/data/timeless_test GET https://vaultunstable/v1/v1/devel/kv/data/timeless_test GET https://vaultunstable/v1/v1/ci/kv/data/timeless_test GET https://vaultunstable/v1/v1/common/kv/data/timeless_test DELETE https://vaultunstable/v1/v1/prod/kv/data/timeless_test DELETE https://vaultunstable/v1/v1/stag/kv/data/timeless_test DELETE https://vaultunstable/v1/v1/devel/kv/data/timeless_test DELETE https://vaultunstable/v1/v1/ci/kv/data/timeless_test DELETE https://vaultunstable/v1/v1/common/kv/data/timeless_test' > store-retrieve-delete-targets.txt go run ./slapper.go
-rate 500
-minY 2ms
-maxY 90ms
-H "X-Vault-Token: ${VAULT_TOKEN}"
-H "X-Vault-Request: true"
-targets store-retrieve-delete-targets.txt
#_____________________________________________________________
# store, retrieve, delete kv secrets - single engine
# v1/prod/kv/ kv kv_24a4a5d1. n/a
echo 'POST https://vaultunstable/v1/v1/prod/kv/data/timeless_test $ {"data": { "PROD_test_key_1": "PROD_test_value_1", "PROD_test_key_2": "PROD_test_value_2" }} GET https://vaultunstable/v1/v1/prod/kv/data/timeless_test DELETE https://vaultunstable/v1/v1/prod/kv/data/timeless_test' > store-retrieve-delete-targets.txt go run ./slapper.go
-rate 500
-minY 2ms
-maxY 90ms
-H "X-Vault-Token: ${VAULT_TOKEN}"
-H "X-Vault-Request: true"
-targets store-retrieve-delete-targets.txt

Configuring Vault after setup

Once our Vault was created, there were a lot of projects that we wanted to configure programmatically. At first, we used HashiCorp’s Terraform (TF), given that we have a lot of TF expertise on our team. However, not every team at Confluent has TF experience, and all the documentation for Vault is either HTTP REST calls or Vault CLI commands.

In order to make it quicker/simpler for people to use, we decided to employ a simple set of Golang programs for new post-setup configurations, like creating AppRoles.

Maybe you’re wondering why we don’t simply use the Vault Golang libraries to do the work. Although this approach is certainly an option, the answer is simply that none of our engineers have done so yet. The performance or productivity gains are unknown—all the documentation is based around HTTP REST calls, and such calls aren’t inefficient.

Also, because TF doesn’t modify a configuration that it’s unfamiliar with, the teams that prefer to use our existing TF for configuration will do so, while others will use Golang programs. We are fairly early in this usage pattern and are not yet certain of the inconsistencies between Golang and TF that may manifest in productivity or stability problems.So far, we haven’t run into any issues.

Below is an example skeleton Vault post-setup configuration program that achieves several goals. First, it is a nontrivial example, so you can see how to set up a whole service in Vault. It also deals with production clusters as well as non-production clusters and outputs some helpful Vault CLI commands to show what the demo script achieved. Overall, it serves as a good start for copying/pasting the code for your own config into Vault in a repeatable manner, without having to research how to build such functionality.

/*
 * Sample Vault Service
 * Copyright (c) 2021 Confluent, Inc
 * Authors: Tim Ellis, Elizabeth Bennett, Je Hyung Lee
 * This software is made available by Confluent, Inc., under the terms of the
 * Confluent Community License Agreement, Version 1.0 located at
 * http://www.confluent.io/confluent-community-license. BY INSTALLING,
 * DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY OF THE
 * SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE
 * AGREEMENT.
 */
package main
import ( "../util" "fmt" "log" )
const ( projectName = "timelessExampleProject" roPolicyTemplateName = "%s-%s-ro" // Project/ENV rwPolicyTemplateName = "%s-%s-rw" // Project/ENV roAppRoleTemplateName = "%s-%s-ro-role" // Project/ENV rwAppRoleTemplateName = "%s-%s-rw-role" // Project/ENV keySetConfigPathTemplate = "v1/%s/kv/data/%s/configs" // ENV/Project
// AppRole cannot create nor delete the configs // so Vault must be manually initialized with the path roPolicyTemplate = `{"policy":"path \"%s\" { capabilities = [\"read\"] } "}` rwPolicyTemplate = `{"policy":"path \"%s\" { capabilities = [\"read\", \"update\"] } "}`
integConfigPath = "v1/ci/kv/data/%s/vault-integ-test/*" integPolicyName = "integ-auth-service-read-write" integPolicyTemplate = `{"policy":"path \"%s\" { capabilities = [\"create\", \"read\", \"update\", \"delete\"] } "}` )
var ( venv *util.VaultEnv )
func main() { // Sets up Policies, AppRoles, and KeysetPath for integration venv = util.LoadVaultEnv() for _, ccloudEnv := range []string{"devel", "stag", "prod", "integ"} { log.Println(fmt.Sprintf("Creating Keysets, Policies, and AppRoles for CCloud ENV=%s", ccloudEnv)) log.Println("-----------------------------------------------------") if ccloudEnv == "integ" { if venv.VaultAddr == "https://vaultnonprod" && ccloudEnv == "integ" { // only create integ roles/policies in nonprod vault createPolices(ccloudEnv) createAppRoles(ccloudEnv) } else { log.Println("Not creating objects for Integ path in Prod Vault") } } else { createKeySetPath(ccloudEnv) createPolices(ccloudEnv) createAppRoles(ccloudEnv) } }
// XXX helpful commands for post-exploration of Vault, delete all this for // your own service. fmt.Println("\n\nTo see the results, and play with them, you can run Vault CLI commands like the following:") fmt.Printf(" vault kv list v1/prod/kv/%s/ #prod, but also try stag or devel\n", projectName) fmt.Println(" vault list /sys/policies/acl | grep -i Example") fmt.Printf(" vault read /sys/policies/acl/%s-prod-ro\n", projectName) fmt.Printf(" vault read /sys/policies/acl/%s-prod-rw\n", projectName) fmt.Println(" vault list auth/approle/role | grep -i Example") fmt.Printf(" vault read auth/approle/role/%s-prod-rw-role/role-id\n", projectName) fmt.Printf(" vault write -f auth/approle/role/%s-prod-rw-role/secret-id #generate secret_id\n", projectName) }
// initialize the key set path in vault if it does not already exist // the auth-service AppRole does not have permissions to create paths thus the // path should be manually initialized here func createKeySetPath(ccloudEnv string) { util.IssueVaultRequest(venv, "PUT", "/v1/"+fmt.Sprintf(keySetConfigPathTemplate, ccloudEnv, projectName), // `cas` a.k.a. check-and-set=0 to avoid overwriting the path if it exists already `{"data":{"key_sets":[], "active_kid":-1}, "options":{"cas":0}}`) }
func createPolices(ccloudEnv string) { path := fmt.Sprintf(keySetConfigPathTemplate, ccloudEnv, projectName) rPolicy := fmt.Sprintf(roPolicyTemplate, path) rwPolicy := fmt.Sprintf(rwPolicyTemplate, path) roPolicyName := fmt.Sprintf(roPolicyTemplateName, projectName, ccloudEnv) rwPolicyName := fmt.Sprintf(rwPolicyTemplateName, projectName, ccloudEnv)
// create policies for read-only role util.IssueVaultRequest(venv, "PUT", "/v1/sys/policies/acl/"+roPolicyName, rPolicy) // create policies for read/write role util.IssueVaultRequest(venv, "PUT", "/v1/sys/policies/acl/"+rwPolicyName, rwPolicy)
// create additional policies for integ env // these policies are used in the auth-service vault client integ tests if ccloudEnv == "integ" { integPolicy := fmt.Sprintf(integPolicyTemplate, fmt.Sprintf(integConfigPath, projectName)) util.IssueVaultRequest(venv, "PUT", "/v1/sys/policies/acl/"+integPolicyName, integPolicy) } }

func createAppRoles(ccloudEnv string) { roPolicyName := fmt.Sprintf(roPolicyTemplateName, projectName, ccloudEnv) rwPolicyName := fmt.Sprintf(rwPolicyTemplateName, projectName, ccloudEnv) roAppRoleName := fmt.Sprintf(roAppRoleTemplateName, projectName, ccloudEnv) rwAppRoleName := fmt.Sprintf(rwAppRoleTemplateName, projectName, ccloudEnv)
// create read-only auth-service approle with read-only policies util.IssueVaultRequest(venv, "POST", "/v1/auth/approle/role/"+roAppRoleName, fmt.Sprintf({"policies": "%s"}, roPolicyName)) // create read/write auth-service approle with read/write policies util.IssueVaultRequest(venv, "POST", "/v1/auth/approle/role/"+rwAppRoleName, fmt.Sprintf({"policies": "%s"}, rwPolicyName)) }

Here is the ../util library that it imports:

/*
 * Vault Utilities for Golang Configuration Programs
 * Copyright (c) 2021 Confluent, Inc
 * Authors: Elizabeth Bennett, Je Hyung Lee
 * This software is made available by Confluent, Inc., under the terms of the
 * Confluent Community License Agreement, Version 1.0 located at
 * http://www.confluent.io/confluent-community-license. BY INSTALLING,
 * DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY OF THE
 * SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE
 * AGREEMENT.
 */
package util
import ( "fmt" "io/ioutil" "log" "net/http" "os" "strings" "time" )
type VaultEnv struct { VaultAddr string VaultToken string QuietMode string }
func LoadVaultEnv() *VaultEnv { vaultAddr := os.Getenv("VAULT_ADDR") vaultToken := os.Getenv("VAULT_TOKEN") quietMode := os.Getenv("VAULT_UTIL_QUIET") if vaultAddr == "" || vaultToken == "" { fmt.Println("Please set VAULT_ADDR and VAULT_TOKEN, example,") fmt.Println(" export VAULT_ADDR=https://vaultnonprod") fmt.Println(" export VAULT_TOKEN=$(vault print token)") fmt.Println(" # optionally, to make the program be quieter...") fmt.Println(" export VAULT_UTIL_QUIET=1") log.Fatal("Environment Variables Unset") }
return &VaultEnv{VaultAddr: vaultAddr, VaultToken: vaultToken, QuietMode: quietMode} }
func IssueVaultRequest(venv *VaultEnv, method string, path string, reqBody string) { log.Println(fmt.Sprintf("Issuing REST call [[vaultAddr=%s: method=%s path=%s reqBody=%s]]", venv.VaultAddr, method, path, reqBody)) req, err := http.NewRequest(method, venv.VaultAddr+path, strings.NewReader(reqBody)) if err != nil { log.Fatal("Error setting up request: ", err) }
req.Header.Set("X-Vault-Token", venv.VaultToken)
vaultClient := &http.Client{Timeout: time.Second * 10} resp, err := vaultClient.Do(req) if err != nil { log.Fatal("Error reading response: ", err) } defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body) if err != nil { log.Fatal("Error reading body: ", err) }
if venv.QuietMode == "" { log.Println(fmt.Sprintf("Vault Response: %s", body)) } }

AppRole Secret ID revocation or rotation

It is important to note that Confluent has not suffered any leak or compromise of its secure data that led to the creation of this section. We are at work on several security and compliance initiatives, and as part of that, we need to be sure that all components are functional, including under abnormal or adverse conditions.

We couldn’t find any good or simple writeups about how to implement this particular revocation procedure. There are a plethora of great articles on how to set things up but not about what to do when things go wrong. We also have plans to begin rotating out AppRole credentials within a short timeframe, which will require this procedure to be performed often.

So, if you have an AppRole whose Secret ID has become outdated, compromised, lost, or stolen, the following details the process for rotating it out. For the purpose of this example, the AppRole’s name is timelessExampleProject-prod-ro-role.

Create a new AppRole Secret ID

$ vault write -f auth/approle/role/timelessExampleProject-prod-ro-role/secret-id
Key                   Value
---                   -----
secret_id             b7789047-c0nflu3n7-1349-2fa9de887fb0
secret_id_accessor    c4dc05a8-c0nflu3n7-a2ec-59f6499eb9f1

Now you can reconfigure your service with the new secret ID that you received.

Get the old and compromised AppRole Secret ID accessor

$ vault list auth/approle/role/timelessexampleproject-prod-ro-role/secret-id
Keys
----
67bda42b-c0nflu3n7-acae-f0cd089f9b11   <--old accessor
c4dc05a8-c0nflu3n7-a2ec-59f6499eb9f1   <--accessor of secret_id we just made

Despite the deceptive API call’s path, this is not a list of secret IDs. It’s a list of secret-id-accessors. You can see the one that you just created, starting with c4dc. The old one starts with 67bd and is the one that you want to revoke. In general, you want to revoke all but the one that you just created in case there are more than two secret IDs that are simultaneously active.

Revoke the old and compromised AppRole Secret ID

$ vault write \
  auth/approle/role/timelessexampleproject-prod-ro-role/secret-id-accessor/destroy \
  secret_id_accessor=67bda42b-c0nflu3n7-acae-f0cd089f9b11
Success! Data written to: auth/approle....secret-id-accessor/destroy

At this point, the old Secret ID has been “revoked.” If you attempt to login with it, you will see a failure. The code below tests the success of the proper AppRole and the failure of the revoked one:

$ vault read auth/approle/role/timelessexampleproject-prod-ro-role/role-id
Key        Value
---        -----
role_id    80a1fea2-c0nflu3n7-48ac-081d5e06cea5
$ vault write /auth/approle/login \ role_id=80a1fea2-c0nflu3n7-48ac-081d5e06cea5 \ secret_id=b7789047-c0nflu3n7-1349-2fa9de887fb0 Key Value --- ----- token s.KYEaP6c0nflu3n7AgVnBSTE <--SUCCESS!
$ vault write /auth/approle/login \ role_id=80a1fea2-c0nflu3n7-48ac-081d5e06cea5 \ secret_id=d1e6abfa-c0nflu3n7-3f1a-fadcdee049e0 Error writing data to auth/approle/login: Error making API request. ... * invalid secret id <--as expected and promised, failure!

Next steps

This blog post detailed a handy procedure to have in your security operations playbooks, just in case. If you’d like to get started with a fully managed platform for setting your data in motion, you can sign up for fully managed Apache Kafka as a service and receive $400 of free usage during your first 60 days. You can also use the promo code CL60BLOG to receive an additional $60 of free usage.*

Start Free

  • Tim Ellis is a long-time industry veteran, bringing his expertise to Confluent on its security and compliance initiatives.

Did you like this blog post? Share it now