Développez l'apprentissage automatique prédictif avec Flink | Atelier du 18 déc. | S'inscrire

Connect, Process, and Share Trusted Data Faster Than Ever: Kora Engine, Data Quality Rules, and More

Écrit par

In today's data-driven world, businesses are compelled to expand their data capabilities to cater to the evolving needs of their customers. In addition, with data being produced at an unprecedented rate, businesses are finding it imperative to share this data externally. However, they need to ensure the data is of the highest quality. To keep up with these challenges, businesses are seeking solutions that enable them to share trusted data externally with just a few clicks, without compromising security or reliability. As a result, data integration and management have become crucial components of modern-day business operations. Businesses are exploring new approaches to handle data that offer flexibility, scalability, and ease of use to deliver a seamless data-sharing experience.

Keeping the needs of modern businesses in mind, we are excited to introduce the latest features of Confluent Cloud. These new functionalities enable businesses to connect, process, and share their trusted data faster than ever before.

Here’s an overview of the latest features—read on for more details:

Join us to see the new features in action in the Q2 Launch demo webinar.

Kora: The Apache Kafka® engine built for next-level of elasticity, reliability, and performance in the cloud

As data streaming becomes more ubiquitous and businesses start growing their workloads, teams, and use cases on Apache Kafka, engineering leaders often feel increasing pressure from the complicated and costly cluster scaling, intensifying platform availability and durability risks, and growing unpredictability on end-to-end latency. 

Many innovations were born in the past 15 years to address similar growing pains for other modern infrastructures with a cloud-native solution, just S3 to NFS and Snowflake to Teradata. A truly cloud-native service doesn’t just package an open source software on Kubernetes, it takes advantage of cloud infra’s scalability and versatility to deliver a better customer experience while abstracting away all the complexity and burden of managing the cloud. 

Therefore, we invested five million hours to craft a truly cloud-native experience for our customers and created Kora, the Apache Kafka engine built for the cloud. With serverless abstraction, automated operations, service decoupling, and global availability, Kora brings GBps+ elastic scaling, guaranteed reliability, and supercharged performance to Confluent Cloud, so that you can:

  • Scale up and down to handle any workload spike and retention requirement more than 10x faster and easier

  • Offload Kafka maintenance and operational burdens with 10x more availability, 99.99% uptime SLA, and built-in data durability

  • Speed up real-time analytics and customer experiences with predictable low latency, sustained across time

Kora still has Apache Kafka at heart and is 100% compatible with Kafka API. It is embedded in Confluent Cloud and is powering 30,000+ Confluent Cloud clusters globally. For more information on what’s under the hood of Kora, check out our announcement blog post.

Data Quality Rules in Stream Governance: Ensure trusted, high-quality data streams

Data contracts are a formal agreement between upstream and downstream components around the structure and semantics of data that is in motion. One critical component of enforcing data contracts is rules or policies that ensure data streams are high quality, fit for consumption, and resilient to schema evolution over time.

Consider a scenario where a company collects customer data, including a social security number field. Even if the schema of a message is structurally correct, the social security number field may contain an invalid value, which presents a problem for downstream applications that use the social security number field to derive customer identity.

To address this problem, Confluent's Stream Governance suite now includes Data Quality Rules to better enforce data contracts, enabling users to implement customizable rules that ensure data integrity and compatibility and quickly resolve data quality issues. With Data Quality Rules, schemas stored in Schema Registry can be augmented with several types of rules, such as:

Domain Validation Rules: These rules validate the values of individual fields within a message based on a boolean predicate. Domain validation rules can be defined using Google Common Expression Language (CEL), which implements common and simple semantics for expression evaluation.

Event-Condition-Action Rules: These rules trigger follow-up actions upon the success or failure of a Domain Validation rule. Users can execute custom actions based on their specific requirements or leverage predefined actions to return an error to the producer application or send the message to a dead-letter queue.

Complex Schema Migration Rules: These rules simplify schema changes by transforming topic data from one format to another upon consumption. This enables consumer applications to continue reading from the same topic even as schemas change, removing the need to switch over to a new topic with a compatible schema.

Combined with other Stream Governance features, Data Quality Rules allow organizations to deliver trusted data streams to downstream consumers and protect themselves from the significant impacts of poor-quality data.

For more information on Data Quality Rules and Data Contracts, please refer to our documentation.

Custom connectors: Break any data silo by bringing your own connector plugins without managing infrastructure  

Every organization has unique data architecture needs, which require building custom connectors or modifying existing ones to integrate home-grown systems, custom applications, and the long tail of less popular data systems with Kafka. However, teams typically have to self-manage these customized connectors and take on non-differentiated infrastructure responsibilities and risks of downtime.

We’re excited to introduce custom connectors that enable you to bring your connectors to the cloud so that you don’t have to manage Connect infrastructure. With custom connectors, you’re able to:

  • Quickly connect to any data system using your own Kafka Connect plugins without code changes

  • Ensure high availability and performance using logs and metrics to monitor the health of your connectors and workers

  • Eliminate the operational burden of provisioning and perpetually managing low-level connector infrastructure 

Custom connectors join our portfolio of over 70 pre-built and fully managed connectors on Confluent Cloud to cover all data systems and apps for any streaming use case.

“To provide accurate and current data across the Trimble Platform, it requires streaming data pipelines that connect our internal services and data systems across the globe. Custom connectors will allow us to quickly bridge our in-house event service and Kafka without setting up and managing the underlying connector infrastructure. We will be able to easily upload our custom-built connectors to seamlessly stream data into Confluent and shift our focus to higher-value activities.”

 – Graham Garvin, Product Manager, Trimble

To get started, simply upload each custom plugin once in the console for any user in your org to access it from the connector catalog page and configure and provision connector instances. Custom connectors support custom single message transforms (SMTs) for on-the-fly data transformations, further tailoring the connector to fit your specific use case. After launching the connector, leverage built-in logs and metrics for diagnostics, monitoring, and debugging. You can view both connector and connect worker-level logs in a Kafka topic, accessible through the Kafka API / CLI, the logs page in the console, or a connector like the Elasticsearch Service Sink connector. You can also view connector task status, CPU, and memory usage from the metrics page to understand how your custom connectors are performing. 

Filter by standard Kafka Connect log levels such as fatal, error, warn, and info, to quickly identify the root cause of any issues.

We share responsibilities with users to ensure that their custom connectors run successfully with high availability. Confluent takes on critical infrastructure activities, including resource provisioning, Connect cluster management, Schema Registry, monitoring, and security. Teams are responsible for providing and troubleshooting the connector plugin, versioning, patching, and overall connector management. 

Custom connectors are generally available on AWS in five regions: us-east-1, us-east-2, us-west-2, eu-west-1, and eu-central-1. Learn more about how to write, package, upload, and run a custom connector by checking out our latest tutorial on Confluent Developer.

Stream Sharing: Share data streams across organizations easily and securely in a few clicks

Businesses in a digital-first world not only need streaming data pipelines to connect data systems internally for informed decision-making, but also need to share real-time data externally with other business units, vendors, partners, and customers. Data sharing is a business necessity, but common methods like flat file sharing were designed for data at rest, and using it to share data in motion results in out-of-sync and stale data as well as scalability challenges and security concerns.

That’s why we’ve built Stream Sharing, the easiest and safest way to share streaming data across organizations. Organizations leveraging Stream Sharing will be able to:

  • Easily exchange real-time data without delays in a few clicks directly from Confluent to any Kafka client

  • Safely share and protect your data with robust authenticated sharing, access management, and layered encryption controls

  • Trust the quality and compatibility of shared data by enforcing consistent schemas across users, teams, and organizations

Stream Sharing enables Confluent Cloud users with the right permissions to share their Kafka topics with any data recipient by simply entering the recipient’s email address. Confluent's Stream Sharing allows sharing of data streams across organizations using open source Kafka Consumer API and retaining all of the robust security controls. Recipients need to create a Confluent Cloud account, and schema-enabled topics require using Confluent Cloud Schema Registry from a Stream Governance package. The service is generally available and provided at no extra cost, with either party able to revoke access at any time.

Share schema-enabled data streams securely with multiple organizations and businesses in just a few simple steps

Flink Early Access Program

Following Confluent’s Immerok acquisition earlier this year, the early access program for our fully managed service for Apache Flink has now opened to select Confluent Cloud customers. The program will enable customers to try the service and help shape our product roadmap by partnering with our product and engineering teams.

By bringing Flink to Confluent Cloud, customers can take advantage of Flink's powerful and versatile stream processing framework while offloading its complex day-to-day operations to the world's foremost data streaming experts. Our Flink service will employ the same product principles you’ve come to expect for Kafka: 

  • Cloud-native: Eliminate the operational burden of managing Flink with a fully managed, cloud-native service that is simple, performant, and scalable

  • Complete: Leverage Flink fully integrated with Confluent’s complete feature set (e.g., Stream Governance, RBAC), enabling developers to build stream processing apps quickly, reliably, and securely

  • Everywhere: Seamlessly process your data everywhere it resides with a Flink service that spans across the three major cloud providers (AWS, GCP, and Azure)

If you are interested in participating in the Flink Early Access Program, be sure to apply today!

Other new features in the Confluent Cloud launch

Kafka REST Produce API: The Kafka REST Product API in Confluent Cloud enables developers to produce new messages easily without the need for the Kafka client library. This cloud-native solution uses HTTP to interact with Kafka topics and messages, making it flexible and language-agnostic, and ideal for scaling data streaming workloads.

CLI AsyncAPI with Stream Governance: Confluent Cloud's Stream Governance suite provides tooling for obtaining and importing AsyncAPI specifications into the cloud, enabling developers and architects to programmatically define topics, schemas, tags, and more using the open source standard for event-driven architectures.

HITRUST certification: Confluent Cloud is now HITRUST-certified, which is a “Gold Standard” for the healthcare industry. The HITRUST Common Security Framework (CSF) is a certifiable framework that leverages internationally accepted standards to help healthcare organizations and their providers demonstrate their security and compliance. 

Bring-Your-Own-Key (BYOK) encryption: BYOK enables self-managed key encryption for Dedicated Kafka clusters, ensuring data privacy and integrity, now available on Azure, AWS, and Google Cloud.

Static Egress IP addresses: Static Egress IP addresses allow customers to achieve better network security and reliability, through consistent, static IP addresses for egress traffic from their Kafka clusters. To learn more, see Use Static IP addresses on Confluent Cloud and Egress Static IP Addresses for Confluent Cloud Connectors.

Private DNS support: Private DNS support allows customers to simplify on-prem access through the most secure private networking options while avoiding security exceptions or any custom DNS implementation. This support is available on all three major cloud platforms—AWS, Azure, and Google Cloud.

Start building with new Confluent Cloud features

Ready to get started? Remember to register for the Q2 ʼ23 Launch demo webinar on May 31 where you’ll learn firsthand from our product managers how to put these new features to use. 

And if you haven’t done so already, sign up for a free trial of Confluent Cloud. New sign-ups receive $400 to spend within Confluent Cloud during their first 30 days. Use the code CL60BLOG for an additional $60 of free usage.*

The preceding outlines our general product direction and is not a commitment to deliver any material, code, or functionality. The development, release, timing, and pricing of any features or functionality described may change. Customers should make their purchase decisions based upon services, features, and functions that are currently available.

Confluent and associated marks are trademarks or registered trademarks of Confluent, Inc.

Apache® and Apache Kafka® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by the Apache Software Foundation is implied by the use of these marks. All other trademarks are the property of their respective owners.

  • Bharath Venkat is a product marketing manager at Confluent responsible for digital growth and Confluent Platform. Before joining Confluent, Bharath was a product management leader at AT&T, Cisco Appdynamics, Lacework, and Druva focussing on artificial intelligence, machine learning, data science and analytics, and growth initiatives.

  • David Araujo est le directeur de la gestion des produits pour Stream Governance chez Confluent. Ingénieur puis chef de produit, il a travaillé dans de nombreux secteurs et sur plusieurs continents, principalement dans le domaine de la gestion et de la stratégie de données. Il est titulaire d'un master et d'une licence en informatique de l'université d'Evora au Portugal.

Avez-vous aimé cet article de blog ? Partagez-le !