[Webinar] Bringing Flink to On-Prem and Private Clouds. Register Now
With the increasing importance of real-time data in modern businesses, companies are leveraging distributed streaming platforms to process and analyze data streams in real time. Many companies are also transitioning to the cloud, which is often a gradual process that takes several years and involves incremental stages. During this transition, many companies adopt hybrid cloud architectures, either temporarily or permanently. In a hybrid cloud architecture, some applications run on-premises, while others run in the cloud. Confluent Platform allows users to connect, process, and react to all their data in real-time with a comprehensive, self-managed platform for Apache Kafka®.
Over the last two years, we have periodically announced Confluent Platform releases, each building on top of the innovative feature set from the previous release. In this blog post, we will talk about some of these core features that make hybrid and on-premises data streaming simple, secure, and resilient. You can always find additional details about the features in the release notes.
Apache Kafka Raft (KRaft) metadata mode is a modern, high-performance consensus protocol that was introduced to remove Kafka’s dependency on ZooKeeper for metadata management. Replacing external metadata management with KRaft greatly simplifies Kafka’s architecture by consolidating responsibility for metadata into Kafka itself, rather than splitting it between two different systems: ZooKeeper and Kafka. This improves stability, simplifies the software, and makes it easier to monitor, administer, and support Kafka. The key benefits include:
Simplified architecture: Improve cluster stability and reduce management burden by consolidating metadata management into Kafka itself
Enhanced scalability: Scale right-sized clusters to millions of partitions and achieve up to a 10x improvement in recovery time after controlled and uncontrolled shutdowns
Initially previewed in Confluent Platform 7.0, production-ready KRaft for new clusters was made available in the 7.4 release and we are excited to announce seamless upgrades for existing clusters from ZooKeeper to KRaft in our latest Confluent Platform 7.6 release. Learn more about why ZooKeeper is being replaced with KRaft in Guozhang Wang’s blog post.
Confluent for Kubernetes (CFK) provides a comprehensive, declarative API to deploy and operate Confluent as a cloud-native system on Kubernetes. We continued to enhance this offering over the past few releases by adding diverse functionalities that would enhance cloud-native security and reliability, and reduce operational burden. In the recent 7.4 release, we introduced Confluent for Kubernetes Blueprints which would further accelerate time to market by providing customers with self-service access to streaming platform resources for application teams from a single control plane using Blueprints.
Organizations that desire an Infrastructure-as-code cloud-native automation for managing their streaming infrastructure and applications should look no further than Confluent for Kubernetes.
In Confluent Platform 7.6, we’re excited to announce compaction support for Tiered Storage. This is a major addition to our storage engine that lets users offload even more data to object storage and be even more elastic. Tier compaction has been battle-tested in production in Confluent Cloud on over 30,000 clusters for the last 12 months and we’re ready to bring it to our self-managed users.
Confluent Platform Tiered Storage has been validated for production usage with multiple backing object storage vendors (see list). We recently introduced support for Azure Tiered Storage as well.
As businesses strive to deliver real-time data, engineering teams must ensure that their data in motion is consistent, reliable, and of high quality. This necessitates the establishment of clear Data Quality Rules, which serve as a formal agreement between upstream and downstream components regarding the structure, semantics, and quality of data in motion. The upstream component is responsible for enforcing these rules, while the downstream component can safely assume that the data it receives adheres to the Data Quality Rules.
Introduced in our 7.4 release, Data Quality Rules was added for domain validation and schema migration, allowing developers and architects to easily validate important, sensitive data and/or easily move from old data formats to new ones.
Domain Validation Rules: These rules validate the values of individual fields within a message based on a boolean predicate. Domain Validation Rules can be defined using Google Common Expression Language (CEL), which implements common and simple semantics for expression evaluation.
Schema Migration Rules: These rules simplify schema changes by transforming topic data from one format to another upon consumption. This enables consumer applications to continue reading from the same topic even as schemas change, removing the need to switch over to a new topic with a compatible schema.
The ecosystem of connectors built by Confluent, the Kafka community, or partners is often used by developers to save time and engineering resources that would have been spent building and testing each connector themselves. Confluent Platform offers over 120 pre-built connectors that enable customers to integrate different ecosystems with Kafka and Confluent Platform more quickly and reliably. These connectors are designed and tested by experts to increase developer productivity and cost-effectiveness, speeding up time to value and reducing risk.
Over the past year, Confluent has introduced and improved several premium connectors, including IBM MQ source and sink connectors, Oracle CDC connectors, and others. These premium connectors offer advanced features and capabilities to modernize customers' data architecture, making data integration processes more streamlined and improving overall data architecture.
IBM MQ source and sink premium connectors: Introduced in Confluent Platform 7.3, enables the use of the connect framework on z/OS by offering a certified connect worker and certified IBM MQ source and sink premium connectors. This allows you to deliver and transform complex mainframe data and make it usable in modern cloud data platforms, such as Snowflake, Elastic, and MongoDB.
Oracle CDC premium connector: Oracle is one of the most common relational databases needing to connect to Kafka. Oracle CDC connector allows customers to cost-effectively and reliably offload data from Oracle Database, enabling developers to capture highly valuable change events from each Oracle table into a separate Kafka topic to build modern applications. Since its initial launch, we have made several improvements to the CDC connector to make the experience more robust and seamless for the customers.
Self-managed connectors with CFK: As mentioned, Confluent for Kubernetes (CFK) is a cloud-native control plane that enables a comprehensive declarative experience to deploy and self-manage Confluent Platform in a private Kubernetes-based infrastructure.
Customers often need a multitude of connectors, across infrastructures, to get data in and out of Kafka. CFK enables you to manage these connectors in a consistent, infra-as-code, and automated manner. Customers can leverage CFK to build out a self-service connector platform for their application teams to leverage. To learn more, check out Declarative Connectors with Confluent for Kubernetes.
In Confluent Platform 7.0, we announced Cluster Linking, which enables you to easily link clusters together to form a highly available, consistent, and real-time bridge between on-prem and cloud environments and provide teams with self-service access to data wherever it resides. Cluster Linking delivers three primary benefits to move your data in real-time to wherever it suits your business:
Accelerate the enterprise journey to the cloud by securely, reliably, and effortlessly creating a bridge between the cloud and on-prem environments
Enable self-service access to data in real-time across the business with globally connected clusters that perfectly and reliably mirror data across all of your environments
Reduce total cost of ownership (TCO) and operational burdens with seamless and cost-effective data geo-replication across Kafka clusters everywhere they reside
Since its initial launch, we continued to add additional features within Cluster Linking to improve the functionality. This included Flexible Topic Naming, which further simplified the setup of common hybrid and multicloud use cases, including organizational data sharing, data aggregation, or even a multi-region active-active deployment. Recently, as part of our 7.5 release, we introduced Bidirectional Cluster Linking which solves the problem of managing consumer offset migration between clusters in active-active scenarios or when failing back in active-passive scenarios. It enhances the efficiency of using Cluster Linking for disaster recovery (DR), providing a powerful solution for seamless data replication and metadata synchronization between clusters. Other features introduced in Cluster Linking include replication for new mirror topics to begin from the latest offset or from a specific timestamp, and different retention settings for mirror topics than their source topic, giving efficient and flexible data sharing and aggregation topologies.
Schema Linking: While Cluster Linking gives the ability to operate connected clusters across environments, it introduces an increased need for globally enforced standards to maximize data quality. This is why we introduced Schema Linking as part of our 7.1 release. Schema Linking provides an operationally simple means of maintaining trusted, compatible data streams across hybrid and multicloud environments by sharing consistent schemas between independent clusters that sync in real time.
Schema Linking supports real-time syncing of schemas for both active-active and active-passive setups. This includes common Cluster Linking use cases, such as cluster migration and real-time data sharing and replication. Leveraged alongside Cluster Linking, schemas are shared everywhere they’re needed, providing an easy means of maintaining high data integrity to ensure a consistent data fabric across the entire business.
Confluent Platform is architected to let you run everywhere your IT systems are across a global footprint, be it in data centers, retail stores, on ships, in factories, or on edge devices.
In 7.6, we now support running Confluent Platform on Arm64 Linux architectures. Arm allows you to improve your price to performance on compute resources. We’ve seen the benefits ourselves—we have taken our entire AWS fleet for Confluent Cloud and moved it to Arm-based images.
The Confluent Server Broker allows you to collect data from any source, persist it, process it, and replicate it across the WAN (through Cluster Linking). Now, you can deploy this in production on low-cost, small-footprint Arm64 architecture infrastructure at the edge.
With 7.6, we also extended our operating system support to include Rocky Linux. Now, you can choose your OS of choice and run Confluent Platform in production safely.
Today, with the launch of our 7.6 release, we are excited to announce a range of new features that continue to simplify architecture, enhance security, and enable you to scale cost-effectively.
Enhance scalability and simplify your architecture with seamless upgrades for existing clusters from ZooKeeper to KRaft
Continue to offload even more data to object storage and improve elasticity through compaction support for Tiered Storage
Reduce operational burden by managing application identities and credentials through your own OIDC identity provider with OAuth (early access)
Support production deployments on Arm64 hardware, allowing you to maintain the performance of Kafka at lower costs
Support production deployment for Rocky Linux
Confluent Platform 7.6 is built on the most recent version of Apache Kafka, in this case, version 3.6. For more details about Apache Kafka 3.6, please read the blog post by Satish Duggana or check out the video by Danica Fine below.
Join us on February 27th to see in action some of the latest innovations that have made hybrid and on-premise data streaming simple, safe, and more resilient than ever.
Download Confluent Platform 7.6 today to get started with the only cloud-native and comprehensive platform for data in motion, built by the original creators of Apache Kafka.
Apache Kafka 3.6 is here! This release includes Tiered Storage (Early Access), the ability to migrate clusters from ZooKeeper to KRaft with no downtime, the addition of a grace period to stream-table joins, & more!
Why replace ZooKeeper with an internal log for Apache Kafka® metadata management? This post explores the rationale behind the replacement, examines why a quorum-based consensus protocol like Raft was utilized […]