[Webinar] How to Protect Sensitive Data with CSFLE | Register Today

IoT Data Streaming for Building Private Wireless Networks

Written By

Importance of Internet of Things (IoT) data in the wireless industry

IoT data is a valuable asset for wireless companies, enabling them to improve network performance, enhance customer experiences, reduce costs, and explore new business opportunities. As the IoT ecosystem continues to grow, wireless companies that effectively harness and analyze this data will be better positioned to thrive in a highly competitive industry.

Handling IoT data requires a comprehensive approach that includes robust security measures, efficient data processing and analytics, network optimization, and adherence to industry standards and best practices. Streaming IoT data presents several challenges that need to be addressed for successful implementation and utilization. These challenges include:

  • Scalability: IoT involves a large number of devices generating a vast amount of data. As the number of devices and data increases, it can become challenging to scale the infrastructure to handle the load.

  • Security: IoT devices are often connected to the internet and can be vulnerable to cyberattacks. The security of IoT devices and networks needs to be addressed to prevent unauthorized access and ensure data privacy.

  • Data management: IoT generates a large amount of data, which needs to be collected, processed, and analyzed. This requires effective data management strategies to ensure that the data is accurate, complete, and timely.

  • Latency: Low latency is critical for many IoT applications, such as autonomous vehicles and industrial automation. Ensuring that data is transmitted and processed in real time can be challenging, especially in wireless networks with varying latency.

Wireless companies and organizations need to stay agile and adaptable to meet these challenges head-on while harnessing the benefits of IoT data streaming. 

Learn More: End-to-End IoT Integration →

Why Confluent for IoT streaming

Confluent Cloud is a cloud-native, complete data streaming platform that is available everywhere your business needs to be. Confluent allows you to harness the full power of cloud and provides an Apache Kafka® service that is substantially better than Kafka alone. In fact, across a number of performance metrics, Confluent Cloud is 10x better than self-managed open source Kafka or semi-managed services. Confluent Cloud offers Apache Kafka’s protocol and is 100% compatible with the open source ecosystem. Powering this is a next-generation engine, Kora, which brings elastic scalability, guaranteed resiliency, and low latency—serving up the Kafka protocol for our thousands of customers and their tens of thousands of clusters. Learn more about Kora engine in this ebook.

Here's how Confluent enables IoT data streaming:

  • Real-time ingestion of IoT data: IoT devices generate a continuous stream of data. Confluent provides a highly scalable and fault-tolerant message queue that can efficiently ingest data from thousands or even millions of IoT devices. You can allow producers to publish IoT data to any topic.

  • Real-time stream processing: Confluent’s ability to handle real-time data streams makes it ideal for IoT applications that require immediate processing of sensor data. You can use Flink, ksqlDB, or Kafka Streams for stream processing to create data products and analyze IoT data in real time, unlocking rapid decision-making and actionable insights.

  • Data transformation: Confluent's Schema Registry manages schemas of IoT data. This ensures that data is properly serialized and deserialized as it flows through Kafka, which is critical for compatibility and data validation. 

  • Connectivity to IoT devices: Confluent offers a rich ecosystem of pre-built, fully managed source and sink connectors that facilitate the integration of Kafka with IoT devices and protocols. These connectors enable seamless data exchange between Kafka and IoT platforms or devices.

  • Scalability: IoT deployments often involve a large number of devices, and Confluent’s scalability allows you to handle data from all these devices without performance bottlenecks. You can scale clusters horizontally to meet growing and unpredictable IoT data volume demands.

  • Reliability: Confluent’s Kafka-based architecture provides durability and fault tolerance, ensuring that critical IoT data is not lost even in the case of hardware failures.

  • Data aggregation: IoT data often comes from distributed sensors. Confluent aggregates and consolidates data from various sensors into a central data stream for analysis, visualization, and consumption by downstream systems. 

  • Enterprise-grade security: Confluent offers robust security features, including encryption, authentication, authorization, and audit logging. This is crucial for protecting IoT data and ensuring that only authorized devices and users can access and publish data.

  • Multicloud and edge deployments: IoT deployments can span multiple cloud providers or include edge computing nodes. Confluent supports multicloud and hybrid deployments, allowing you to manage IoT data across different environments seamlessly.

  • Data integration: Confluent integrates with other data processing and analytics tools, allowing you to build end-to-end IoT streaming data pipelines. This includes integrating with databases, machine learning frameworks, data warehouses, and data lakes.

  • Data retention and archiving: IoT data can accumulate quickly. Confluent helps you configure data retention policies and archive data to long-term storage solutions, ensuring compliance and historical analysis.

How a wireless company uses Confluent Data Streaming Platform

A recent implementation of Confluent was at a wireless company leveraging Kafka as an IoT streaming platform for bidirectional and cross-region data replication. Their network connects phones, tablets, cameras, robots, sensors, and many other IoT devices in the area. For a wireless IoT system, an IoT controller is a critical component, as it provides the central point of control for all devices in the network. It needs to be reliable and secure, with full observability. The IoT controller helps in managing devices, optimizing their performance, and ensuring their security and reliability. Due to the criticality of the component, the company needed a solution that could help in providing a 99.999% uptime SLA. So they used Confluent Cloud as the streaming platform between their multiple regions within AWS using Cluster Linking over Transit Gateway. 

Solution 

Confluent Cloud provides a variety of features to simplify the deployment, management, and scaling of Apache Kafka in the cloud.

  • Enables users to create and manage Apache Kafka clusters without the need to deal with the underlying infrastructure. 

  • Supports deployment across multiple cloud providers, allowing users to choose the cloud platform that best fits their requirements.

  • Includes a variety of pre-built source and sink connectors that simplify the integration of Kafka with other data systems. 

  • Includes Schema Registry, which helps manage and evolve the schemas of messages exchanged in Kafka topics. 

  • Provides security features such as encryption in transit and at rest, access control lists (ACLs), and integration with cloud provider security services for a secure Kafka environment.

  • Includes monitoring tools and metrics to help users track the performance and health of their Kafka clusters. 

  • Allows automatic scaling of Kafka clusters based on the workload, allowing for efficient resource utilization and performance optimization.

Confluent’s Cluster Linking can be used in conjunction with other technologies and best practices to help achieve high availability and reliability.

To achieve a 99.999% SLA, which translates to less than 5 minutes of downtime per year, the customer would typically need to implement a combination of high availability and disaster recovery technologies, such as:

  1. Load balancing: Distributing traffic across multiple servers or clusters to ensure that no single server or cluster is overwhelmed.

  2. Redundancy: Having multiple servers or clusters that can take over if one fails, either through active-passive or active-active configurations.

  3. Failover: The ability to automatically switch traffic to a backup server or cluster if the primary one fails.

  4. Data replication: Replicating data across multiple servers or clusters to ensure that data is not lost in the event of a failure.

  5. Disaster recovery: Having a plan and procedures in place to recover from a major outage or disaster, such as a backup data center or cloud-based recovery solution.

  6. Cluster linking: By combining these technologies with Cluster Linking, you can create a highly available and resilient system that can provide a 99.999% SLA. Cluster Linking helps identify relationships and patterns between different data points, which can be used to optimize the performance and reliability of the system.

Using Cluster Linking bidirectional mode 

Cluster Linking bidirectional mode (a bidirectional cluster link) enables better disaster recovery and active/active architectures, with data and metadata flowing bidirectionally between two or more clusters.

A useful analogy is to consider a cluster link as a bridge between two clusters.

  • By default, a cluster link is a one-way bridge: topics go from a source cluster to a destination cluster, with data and metadata always flowing from source to destination.

  • In contrast, a bidirectional cluster link is a two-way bridge: topics on either side can go to the other cluster, with data and metadata flowing in both directions.

In the case of a bidirectional cluster link, there is no “source” or “destination” cluster. Both clusters are equal, and can function as a source or destination for the other cluster. Each cluster sees itself as the “local” cluster and the other cluster as the “remote” cluster.

In an active/active setup, a bidirectional cluster link ensures that consumer offsets are synced to both clusters, so that consumers and producers can easily failover to the other cluster and resume from the right place.

Below is the customer’s implementation of this solution, showing how they leveraged Confluent Cloud and enterprise features including Cluster Linking as a bidirectional tool between their multiple regions with a cloud service provider. This was a great example of cross-region data replication with Cluster Linking over Transit Gateway.

Solution implementation

Confluent’s technical solution for this use case comprises the following:

Streaming data pipelines:

  • IoT devices like phones, tablets, cameras, and sensors send device metadata to microservices that persist the data MongoDB Atlas database.

  • There is a topic created for every sensor device. These topics are ephemeral.

  • Data from multiple sensors is consolidated using Confluent and streamed back into the target MongoDB database from where it is consumed by other consumers across the organization and downstream applications such as Snowflake for reporting and analytics.

  • Stream processing transforms and enriches IoT telemetry data in real time for monitoring, alerting, improved operational efficiency, and faster and more intelligent decisions.

  • This data is replicated across regions with Cluster Linking for greater availability and durability.

Confluent cluster types:

  • Production: 2 Multi-Zone Dedicated Clusters (Multi-Region) connected over AWS Transit Gateway

  • Load: 1 Multi-AZ Dedicated Clusters (Multi Region) - AWS Transit Gateway

  • Staging: 1 Multi-AZ Standard Cluster - Public Endpoints

  • Development: 5 Single AZ Basic Clusters - Public Endpoints

Confluent connectors:

  • MongoDB Atlas Source: This is a fully managed Confluent connector that moves data from a MongoDB replica set into an Apache Kafka cluster. The connector configures and consumes change stream event documents and publishes them to a Kafka topic.

  • MongoDB Atlas Sink: This is a fully managed Confluent connector that maps and persists events from Apache Kafka topics directly to a MongoDB Atlas database collection. It supports Avro, JSON Schema, Protobuf, JSON (schemaless), String, or BSON data from Kafka topics. The connector ingests events from topics directly into a MongoDB Atlas database, making the data available for querying, enrichment, and analytics.

Here is a sample SQL query for stream processing to create reusable data products from IoT data collected:

CREATE TABLE iot_telemetry (
  `device_id` BIGINT,
  `ts` TIMESTAMP(3)
) WITH (
  'connector' = 'kafka',
  'topic' = 'iot_telemetry',
  'properties.group.id' = 'demoGroup',
  'scan.startup.mode' = 'earliest-offset',
  'properties.bootstrap.servers' = 'BOOTSTRAP_SERVER',
  'properties.security.protocol' = 'SASL_SSL',
  'properties.sasl.mechanism' = 'PLAIN',
  'properties.sasl.jaas.config' = 'org.apache.kafka.common.security.plain.PlainLoginModule required username="API_KEY" password="API_SECRET";',
  'value.format' = 'json',
  'sink.partitioner' = 'fixed'
);
INSERT INTO iot_telemetry_lags SELECT device_id, WINDOWEND - LATEST_BY_OFFSET(ts) as lag_ms,TIMESTAMPTOSTRING(WINDOWSTART, 'yyyy-MM-dd HH:mm:ss') as window_start,TIMESTAMPTOSTRING(WINDOWEND, 'yyyy-MM-dd HH:mm:ss') as window_end FROM TABLE (TUMBLE(TABLE iot_telemetry, DESCRIPTOR(transaction_timestamp), INTERVAL '2' HOUR )) GROUP BY device_id;

Business impact

Confluent provides numerous business benefits that enhance your wireless organization's data processing capabilities:

Greater efficiency and speed: Confluent Cloud helps customers improve return on investment (ROI) by launching faster and reducing their operational burden. It allows you to deploy and scale Kafka within a week of starting with Confluent, thereby accelerating your time to value.

Cost savings: Confluent enables more efficient operations with reduced infrastructure costs (e.g., autoscaling can cut TCO for Dedicated clusters in half by reducing the need to be statically over-provisioned to support bursts of activity), maintenance demands, and downtime risk. Customers have seen operational cost savings of $200-250K annually and cumulative TCO savings over $2.5M over a three-year period.

Optimized resource allocation: By using Confluent Cloud, you can free up development resources from months and years of engineering required for designing, building, testing, and maintaining foundational infrastructure tools—allowing teams to focus on high-value-add work.

IoT security: Confluent Cloud has robust security measures in place, including data encryption, private networking, bring your own key (BYOK), role-based access control (RBAC), and audit logs. These significantly reduce the risk of a security breach.

Data sharing for greater collaboration: Cluster Linking enables direct connection between clusters and mirroring of topics from one cluster to another. This makes it easy to build multi-datacenter, multi-region, and hybrid cloud deployments. It also allows for data sharing between different teams, lines of business, and organizations. Data Portal helps teams across the organization easily discover and use IoT data products in a self-service way for greater agility in building next-generation IoT applications.

Conclusion

Confluent Cloud plays a crucial role in facilitating IoT communication by serving as a robust backbone for IoT communication, ensuring reliable, real-time, and scalable data transfer between IoT devices, applications, and backend systems. In particular, Cluster Linking is a technique that can be used in conjunction with other technologies and best practices to help achieve high availability, reliability, and performance in complex systems. Confluent helps streamline data processing, integration, and analytics in IoT ecosystems, making it a valuable tool for IoT developers and organizations. 

Confluent helps wireless companies boost productivity and time to market while lowering infrastructure costs, enabling teams with self-serve access to real-time data for unlocking new use cases.

Here are additional resources to learn more about streaming IoT data with Confluent: 

  • Tanvi Kotibhaskar is a Senior Solutions Engineer at Confluent, driving customer success by leveraging over 10 years of experience in the field of Information Technology across pre-sales, consulting, project delivery, and product development. She is passionate about engaging with customers to understand their technology landscape, goals, and challenges, and presents them with opportunities for growth, cost reduction, and innovation.

  • Sujitha Sanku is a Senior Solutions Engineer at Confluent. Sujitha has spent her career as a Solutions Consultant at Cloudera, Hortonworks, TriNet, Ericsson (India), and now Confluent where she engages with customers to understand their technology landscape and goals/challenges, and presents to them opportunities for building real-time data pipelines on Confluent Cloud.

Did you like this blog post? Share it now

Win the CSP & MSP Markets by Leveraging Confluent’s Data Streaming Platform and OEM Program

This blog explores how cloud service providers (CSPs) and managed service providers (MSPs) increasingly recognize the advantages of leveraging Confluent to deliver fully managed Kafka services to their clients. Confluent enables these service providers to deliver higher value offerings to wider...


Atomic Tessellator: Revolutionizing Computational Chemistry with Data Streaming

With Confluent sitting at the core of their data infrastructure, Atomic Tessellator provides a powerful platform for molecular research backed by computational methods, focusing on catalyst discovery. Read on to learn how data streaming plays a central role in their technology.