Prädiktives maschinelles Lernen entwickeln, mit Flink | Workshop am 18. Dezember | Jetzt registrieren
While cloud computing adoption continues to accelerate due to its tremendous value, it has also become clear that edge computing is better suited for a variety of use cases. Organizations are realizing the benefits of processing data closer to its source, leading to reduced latency, security and compliance benefits, and more efficient bandwidth utilization as well as supporting scenarios where networking has challenging constraints. These benefits translate into business value by creating better customer experiences, enabling more efficient and resilient operations, and opening up new opportunities.
As the primary steward and engineering entity behind Apache Kafka®, we engage in countless conversations with organizations who have adopted Kafka for their data streaming needs. Through these discussions, it’s become increasingly evident that Kafka is not only thriving in the cloud and traditional data center environments but is also making significant inroads into edge computing scenarios.
The term “edge” is used broadly in industry and can mean slightly different things. For this blog post we are referring to physical locations in close proximity to where data is generated rather than a centralized data center or the cloud. Examples include factories, retail stores, hospitals, vehicles, vessels, cell phone towers, and mobile devices. These examples clearly have very different characteristics that will be addressed later in the post.
This inclination toward Kafka in edge computing scenarios is partially propelled by the same IT trend observed in the broader context, where modernizing applications and systems around data streaming and event-driven architectures has become the norm. The principles that make a Kafka-based event-driven architecture superior, like efficiency, durability, and extensibility, apply with equal, if not enhanced, significance at the edge. Beyond this, the market has discovered that Kafka helps tackle some of the specific challenges we face in edge computing.
When an organization adopts edge computing it is primarily motivated by one or more of four factors:
Mitigation of networking limitations
Lower latency processes
Cost optimization
Security and privacy
Scalability is sometimes also cited as a motivating factor for edge computing; pushing processing from the cloud into other infrastructure layers (edge, “fog”, and IoT layers) can help with scale. This is largely a function of the fact that organizations must scale their edge infrastructure to meet the requirements of their actual use cases, rather than having to make trade-offs around scaling their cloud resources. For example, mobile apps must run on mobile devices and modern factories all have on-site infrastructure associated with equipment, robotics, and sensors anyways.
Let's explore each of the primary factors and how data streaming relates to them.
In industries such as manufacturing, retail, energy, defense, or telecommunications, downtime isn't merely an operational inconvenience—it can translate to significant financial loss or, in the worst-case scenario, a threat to public safety. Clearly, using the cloud for operation or processing data depends on having a network connection; losing the connection results in latency, pause, or failure.
One of the most common reasons for an edge computing architecture are industries and use cases where there are well understood constraints on networking. In some scenarios such as remote field operations, maritime operations, disaster response, and military operations you must plan for being disconnected for lengthy periods of time.
Apache Kafka helps overcome the challenges posed by disconnected, intermittent, and constrained networking conditions. It achieves this through its capability to maintain a local, immutable log of records in sequence. Kafka acts as a highly efficient buffer, collecting and storing data locally when the network connection is unavailable. This local spooling is essential for scenarios where continuous operation is critical, as it allows local event-driven applications to persist in consuming and producing data without interruption.
When the network becomes available, the edge Kafka data can be replicated to a cloud-based instance. Kafka maintains the chronological order of records, a crucial factor in upholding data integrity in distributed systems. Confluent provides cluster-to-cluster replication technology that ensures the ordering is the same at the destination. This means that the cloud processing and edge processing will share a consistent view of the data. In such a setup, time-sensitive operations can continue to rely on the local Kafka instance for immediate processing needs, while less critical tasks, such as reporting or analytical workloads, can be offloaded to the cloud.
Certain business operations require a near-immediate response, which makes the networking round trip to cloud computing impractical. In manufacturing, for instance, quick data processing for real-time quality control can enhance production efficiency and offer safer operations. Some other examples are autonomous and smart vehicles, intelligent transportation, military operations, and telecommunications.
While some aspects of these use cases require"hard real time" and therefore will use embedded processing, in-memory approaches, IPC, etc., there is a spectrum and many round trip processing requirements are satisfied by soft real time or near real time. With Kafka, the time it takes for a record to go from data producer to consumer in low-latency scenarios is measured in single digit milliseconds. For these scenarios Kafka brings a number of advantages and will be situated close to the source of data. This can be directly in the facility or very close by using something like AWS Wavelength or DISH 5G's Confluent offering.
Edge computing offers an opportunity to reduce the costs associated with continuous cloud data transmission.
While cloud service providers (CSPs) do not charge for data ingress, there is a networking infrastructure cost from the edge location: you must pay for sufficient bandwidth and networking capacity to handle the volumes of data going back and forth. In far edge scenarios this can be particularly expensive or challenging.
You most definitely pay for CSP data egress, and shifting processing to the cloud can require high volumes of data going back to the edge as a result.
In some edge cases resources are constrained in some dimensions but have a surplus in others. For instance, in heavy data collection environments the infrastructure might have a lot of untapped CPUs as the space to resource constraint is primarily storage. By shifting some workloads to the edge environment you can get a better return on investment (ROI) on your self-managed infrastructure.
Stream processing with Kafka can allow you to filter or compress the data that needs to be sent to the cloud. An example of this could be temporal aggregation of high volume telemetry. Rather than send every event produced at the edge to the cloud, send the number of times in some time window that similar events have occurred. Rather than send every image to the cloud, just send images where object detections have occurred.
By leveraging the same underlying Kafka technology (including a common, community-based message format) at the edge and in the cloud, organizations can fluidly shift operations between local and cloud environments, optimizing for both performance and cost-efficiency according to their specific requirements. This flexibility is especially beneficial in environments where network reliability cannot be guaranteed, ensuring that critical data-driven applications remain operational regardless of connectivity issues.
From hardware limitations to managing large fleets of clusters, the operation of Kafka at the edge is not without its challenges. Confluent has made many product investments to our platform to help address such issues as well as a robust product roadmap influenced by the growing demand for running Confluent at the edge.
When deploying large numbers of Kafka clusters that are distributed, automation and observability become not just beneficial but critical. While Apache Kafka does not include any significant capabilities in either of those dimensions, Confluent has made significant investments here.
Kubernetes (k8s) is well established as the de facto standard for the management and automation of containerized applications at large scale. While k8s was originally designed for large-scale data centers many advancements have been made to make it suitable for the edge. k3s and KubeEdge are both Kubernetes distributions designed for the edge.
Confluents for Kubernetes provides a full Kubernetes operator for the installation and operation of Confluent clusters. It provides high-level declarative APIs by extending the Kubernetes API through CustomResourceDefinitions to support the management of Confluent services and data plane resources, such as Kafka topics. Users define a CustomResource that specifies the desired state and then Confluent for Kubernetes takes care of the rest. This extends to operational activities as well. If you want to scale out or upgrade the version it just requires a simple change to the declarative spec and the operator does the rest. This can have a profound benefit when you are managing hundreds or thousands of edge clusters. Once you have validated the spec you can start rolling it out across your entire fleet.
While Kubernetes is becoming more popular at the edge in some scenarios it might be deemed as too much overhead in thin edge environments where you potentially have only a few or even a single physical compute node. Ansible is a commonly adopted tool for automating tasks across high volumes of nodes and is popular in edge scenarios since it requires no special agent on management targets but instead leverages standard protocols like SSH. It provides an extensive, open source automation framework that enables you to write, read, and manage your infrastructure with code-like syntax.
Confluent provides a fully supported playbook for automating the configuration and deployment of Confluent Platform clusters. Using this playbook reduces the need for manual intervention, mitigates risks, and ensures consistency across your operations.
To address the challenge of observability in distributed Kafka clusters, Confluent introduced Health+, a comprehensive monitoring solution. Health+ collects health telemetry from Confluent Platform clusters, making this critical information visible in Confluent Cloud. This feature enables real-time monitoring and troubleshooting, providing insights into the health and performance of clusters that are crucial for maintaining operational integrity in distributed deployments. Health+ empowers organizations to proactively address issues before they impact the system, ensuring high availability and reliability of Kafka services.
The architecture below depicts a typical pattern for data streaming with Confluent at the edge, and can be applied to various scenarios. For clarity, “thick edge” refers to edge environments with significant compute and storage resources (e.g., manufacturing plants, 5G communication towers) while “thin edge” refers to the opposite (e.g., single compute unit). Data streaming is applicable to both; however, the usage of it will vary.
In this diagram you can see example architectures for both thick and thin edge scenarios. In the thick edge, data is ingested to Confluent Platform via connectors, by native Kafka producers, or directly from event-driven applications. This data is then processed in the edge environment with Apache Flink® or the Streams API of Apache Kafka® before being synced to Confluent Cloud (if necessary) via cluster linking for additional usage. The additional processing and usage will be things that are not in the critical path for the edge operation such as analysis or adjacent operational systems or applications.
Given the limited resources in thin edge it's unlikely you will have fully redundant clusters or the Kafka Connect framework for data capture. Data is usually produced directly to Kafka and any processing will likely be very simple computations (e.g., filtering events for “alerts”) with simple consumer/producer code of the Kafka Streams API . It’s worth noting, however, that the full capabilities of data streaming are gradually being extended to even these types of resource-constrained devices. The recent release of Confluent Platform 7.6, for instance, includes support for ARM64 Linux architectures for this reason.
The generalized architecture shown here is relevant for a variety of edge use cases with different requirements. At a high level, it can be used for everything from predictive maintenance of vehicle fleets to smart street applications (e.g., connected streetlamps) and naval onboard systems.
Despite the meteoric rise of cloud computing over the past decade, edge computing has proven to be indispensable in various contexts. The ability to process data closer to its source has been of fundamental importance in scenarios which demand low latencies and high security while being restricted by bandwidth limitations and unreliable networks. Manufacturing, retail, transportation, and telecommunications are just a few of the industries now reliant on advances in edge computing.
Alongside developments in networking and ARM architectures, the rise of edge computing has been facilitated by data streaming. As the de facto data streaming software, Apache Kafka allows organizations to stream large volumes of heterogeneous data from multiple sources at edge locations in soft real time, decoupling data producers from consumers. Importantly, it also serves as an immutable, chronological event log; this ensures that data isn’t lost when network connectivity drops.
Confluent Platform, based on Apache Kafka, is the enterprise-ready data streaming platform. With features such as Confluent for Kubernetes (CFK), supported Ansible Playbook, and Health+, Confluent Platform enables organizations to stream, process, and govern data from edge locations in real time, and if required, form a bridge to a unified on-premises/cloud architecture.
To learn more about Confluent at the edge, feel free to reach out to us here or check out the following resources:
Internet connectivity is something we sometimes take for granted. For many, most places we visit, work, or reside have some form of connectivity whether it be cellular, Wi-Fi, fiber, etc. […]
This blog explores how cloud service providers (CSPs) and managed service providers (MSPs) increasingly recognize the advantages of leveraging Confluent to deliver fully managed Kafka services to their clients. Confluent enables these service providers to deliver higher value offerings to wider...