Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now
Querying databases comes with costs—wall clock time, CPU usage, memory consumption, and potentially actual dollars. As your application scales, optimizing these costs becomes crucial. Materialized views offer a powerful solution by creating a pre-computed, optimized data representation. Imagine a retail scenario with separate customer and product tables. Typically, retrieving product details for a customer's purchase requires cross-referencing both tables. A materialized view simplifies this by combining customer names and associated product details into a single table, enhanced with indexing for faster read performance. This approach minimizes the database's workload, reducing query processing time and associated costs.
In the context of data streaming with Apache Kafka®, materialized views become even more valuable. They act as a read-optimized cache of your Kafka data, allowing queries to target the materialized view instead of the raw Kafka data. This significantly boosts performance, especially for complex or frequent queries. The view automatically refreshes as new events stream into the Kafka topic, ensuring data freshness. Thanks to the recent release of mutual TLS on Confluent Cloud and Amazon Redshift, this pattern is now possible between the two services.
Confluent Cloud and Amazon Redshift recently released mutual TLS (mTLS) authentication support for their respective platforms.
For Confluent Cloud, support for mTLS authentication is driven by customers who are migrating workloads from on-premises or other self-managed Kafka solutions to Confluent Cloud, and have existing infrastructure built around mTLS authentication. By bringing their own certificate authority (CA) to Confluent Cloud, customers can easily configure Kafka client authentication to Confluent clusters using customer-owned certificates. Even customers using other AuthN types can benefit by simply adding mTLS to their existing dedicated cluster. Like with API keys and OAuth/OIDC authentication, Confluent Cloud supports configuring role-based access control (RBAC) or access control lists (ACLs) on different client certificates for granular access control.
Amazon Redshift can seamlessly connect to Confluent Cloud using mTLS authentication with AWS Private Certificate Authority or with a self-managed certificate authority stored in AWS Secrets Manager. This blog walks through the use cases for Confluent Cloud and Amazon Redshift, and provides step-by-step instructions for configuration on both sides.
Below is the architecture diagram for this setup. Both the Amazon Redshift cluster and the Confluent cluster will be deployed in the same region to save on inter-region data transfer costs. Public networking will be used; however, the setup is similar with other networking options such as VPC peering, AWS Transit Gateway, or AWS PrivateLink. This architecture assumes you can create your own custom CA. If you need to use an existing CA, you can use the AWS Secrets Manager instead of AWS Private Certificate Authority.
Navigate to Amazon Redshift and click “Create cluster.”
Select any node type.
Select how you would like to set your admin password in the “Database configurations” section.
In the “Cluster permissions” section, click “Associate IAM role” and attach the role you created in the previous step.
All other fields can be left as default.
Create the cluster.
Note: Redshift Serverless workgroups are also supported.
The following steps assume that you’re already signed up with Confluent Cloud and have OrganizationAdmin permissions with your user account.
Navigate to Confluent Cloud and create a dedicated cluster. You can leave it sized to 1 Confluent Unit for Kafka (CKU).
Navigate to the “Topics” tab and click “Create topic.”
Provide the topic name orders
and leave the rest as defaults. Skip the data contract pop-up that comes afterwards.
Navigate to the “Connectors” tab and click “Add connector.”
Select “Sample data.”
Select “Additional configuration.”
Select the orders
topics you just created.
In the “Configurations” section, select the “output record value format” as “JSON.”
Note: Streaming ingestion with Amazon Redshift does not support Schema Registry at this time. Selecting AVRO
, JSON_SR
, or PROTOBUF
will cause records to be sent to Amazon Redshift unserialized.
Select the “Orders” for the schema. Schema references in this case defines what the generated messages will look like as opposed to the serialization used.
Leave the rest as defaults and click “create the connector.”
You can navigate back to the orders
topic and see data flowing in the topic.
Last, navigate to the “Cluster Settings” tab of your cluster and find the bootstrap server. It will have a similar format displayed as the following: pkc-xxxxx.us-east-2.aws.confluent.cloud:9092
. Keep this value handy to use in later steps.
Be sure you are in the same region as your Amazon Redshift cluster and Confluent Cloud cluster (us-east-2
if you’ve been following this guide).
Navigate to AWS Private Certificate Authority and click “Create a Private CA.”
Leave the mode option as “General-purpose” and the CA type option as “Root.”
Fill out the “Subject distinguished name options” accordingly.
Check the checkbox for Pricing and click “Create CA.”
Once the CA is created, be sure to install the CA certificate.
Once you see the Status field as active for the CA, click into your newly created certificate authority.
Find and click the “CA certificate” tab.
Within that tab, you will see an “Additional information” section containing the certificate body. Click the “Export certificate body to a file.” This will download a .pem file to use later.
Navigate to the Workload identities page in Confluent Cloud.
Click “Add provider.”
Select “Certificate authority.”
Select “Add PEM file” and upload the .pem file you downloaded earlier from the Certificate Authority you created.
Finish the setup and create the identity provider.
Within the identity provider, click “Add pool.”
Provide a name and leave the Certificate identifier to “CN.”
Set up the filter for matching client certificates for your requirements. Follow Confluent Cloud CEL filter documentation for accepted filter expressions. For testing purposes, you can set it to true.
Attach the CloudClusterAdmin
role for the dedicated cluster to the identity pool, or a more granular RBAC role if you wish to limit the access of your Redshift client. Make sure you click “Add” and see the role appear on the right panel before moving to the next step.
Click “Validate and save.”
Navigate to the AWS Certificate Manager.
Click “Request.”
Select “Request a private certificate.”
In the “Certificate authority” dropdown, you’ll see the CA you created in the previous section.
For the “Fully qualified domain name,” you can provide any name for the purposes of this exercise.
Leave the rest as defaults and click “Request.”
Once the certificate is issued, copy the certificate ARN and set it aside for future use.
The following IAM policy and role allows Amazon Redshift to retrieve the certificate from AWS Certificate Manager.
Navigate to IAM and create a new policy.
Use the following JSON to define the policy. This policy gives Redshift acm:ExportCertificate
permissions so it can use the previously created certificate.
Provide the policy a name like ExportCertificatePolicy
and create the policy.
Navigate to the IAM Role tab and create a new role.
For the trusted entity, select “custom trust policy.”
Edit the values in the trust policy below, paste it into the “Custom trust policy” box, and click next. This trust policy allows Redshift to assume this role on your behalf.
Add the policy you just created and click next.
Provide the role a name.
Click Create Role.
Navigate to Amazon Redshift and open the query editor for your cluster.
Run the following command to create an external schema in Redshift that tells Redshift the Confluent Cloud cluster to connect to, authentication method to use, and which certificate to use for mTLS:
Create the materialized view. This materialized view will be used to link Redshift with a topic in the Confluent Cloud cluster, and this is also where the data will be stored during ingestion.
With your materialized view created, you can now query the data. Note: It may take a few seconds before the data starts to get ingested into the Redshift cluster.
With this setup, integrating Confluent Cloud with Amazon Redshift materialized views offers a powerful solution for real-time data ingestion and analysis. Materialized views act as a pre-computed, read-optimized cache of your Kafka data, enabling significantly faster query performance compared to querying raw Kafka data. This is particularly beneficial for complex or frequent queries. The view automatically refreshes as new data arrives in your Kafka topic, ensuring data freshness.
Ready to get started with Confluent Cloud on AWS Marketplace? New sign-ups receive $1,000 in free credits for their first 30 days! Subscribe through AWS Marketplace and your credits will be instantly applied to your Confluent account.
You can explore more in the documentation included below:
Use Mutual TLS (mTLS) to Authenticate to Confluent Cloud Resources
Configure Mutual TLS (mTLS) Authentication on Confluent Cloud
If you are not using a dedicated cluster, need Schema Registry support, or need to load data in the Redshift table (as opposed to just a materialized view), consider using Confluent Cloud’s fully managed connector for Amazon Redshift.
Amazon and all related marks are trademarks of Amazon.com, Inc. or its affiliates.
Apache and Apache Kafka® are trademarks of the Apache Software Foundation.
This blog post talks about Confluent’s newest enhancement to their fully managed connectors: the ability to assume IAM roles.