Sign up for Camp Confluent! | Register Now

Enhanced Cybersecurity with Real-Time Log Aggregation and Analysis

Written By

In today’s hyper-connected world, systems are more intertwined and complex than ever. Myriad data sources including applications, databases, network and IoT devices continuously generate vast amounts of data, capturing every event and interaction. Imagine harnessing this data–login logs, firewall logs, IPS logs, web logins–aggregating it, and analyzing it to create a holistic view of your entire infrastructure.

By integrating IoT data in real time, for example, we can consolidate sensor information to gain a complete understanding of operational efficiencies, system behavior and status. This unified approach not only enhances visibility, but also enables real-time detection of malicious activity – a critical component in cybersecurity.

As cyberattacks grow more sophisticated, data streaming supports the ability to monitor and analyze real-time events across all systems. Consider a scenario where a hacker gains access to an employee’s account, initiating a series of unauthorized actions. Using aggregated real-time data, it’s possible to identify such anomalies swiftly, providing an immediate response to potential threats.

Implementing a data streaming solution for aggregating and analyzing logs provides key advantages: 

  • Real-time data flow across systems and reusable data products, which can serve myriad use cases for different teams and lines of business. 

  • A complete 360° view and better understanding of what’s happening across infrastructure and system events. 

  • Leveraging streaming data in creating and training machine learning (ML) models

  • Supplying real-time, contextualized data for AI to detect malicious activity, bottlenecks, and application errors – shortening time to detection and issue resolution.

Today’s business and technical challenges 

Log aggregation for cybersecurity comes with challenges, including managing the sheer scale of data and ensuring real-time processing. This often requires dedicated data scientists and additional FTEs for maintaining the infrastructure for tools like Kafka, Flink, and other SIEM solutions. Many organizations prefer fully managed services to alleviate the burden of infrastructure maintenance, allowing them to focus on rapid threat analysis response. This shift not only reduces operational complexity but also ensures that cutting-edge security practices are consistently applied. 

There are numerous data and infrastructure challenges around log aggregation and analysis, including: 

  • Large data volumes and high-velocity data. 

  • Traditional batch processing methods for log data are too slow, leaving systems vulnerable. 

  • Heterogeneity of data with many systems to extract info from. 

  • Integrating data from diverse sources where different systems with different interfaces (e.g., JSON, mobile, devices, databases) may be developed by various teams, lacking oversight and a unified view. 

  • Point-to-point data pipelines are rigid with governance issues, and increase the complexity of data integration and management. 

  • Maintenance and monitoring become unscalable and increasingly difficult using legacy tools. 

  • Slower time to market for building new features. 

Without an aggregated real-time view, it’s difficult to collect and centralize logs to identify and respond to suspicious activities such as anomalous access patterns, data exfiltration, or configuration changes in real time.

Real-time log aggregation and analysis with Confluent 

Confluent’s data streaming platform solves the above challenges by providing real-time visibility into IT infrastructure, unlocking the continuous flow of events to allow security systems to identify anomalies instantly rather than waiting for batch processing.

With Confluent, organizations can adopt a shift-left approach with stream processing. That is, build data once, build it right and reuse it anywhere within milliseconds of its creation. In this way, all applications are supplied with equally fresh data and represent the current state of your business. By shifting left, organizations can eliminate data quality and inconsistency issues, and reduce duplicative processing and associated costs.

Additionally, stream processing can correlate events across multiple logs in real time, providing a comprehensive view and facilitating informed decision-making. Faster detection and response rather than waiting for batch processing reduces risk. Confluent provides the ability to aggregate and analyze data at scale–no matter the volume or velocity–and leverage an event-driven architecture to take immediate actions such as raising an alert for a breach or closing a port.

  • Stream data anywhere you need it, across on-premises, hybrid, and multi-cloud environments. Streaming minimizes the latency between data generation and analysis, which supports prompt detection and response to security threats. Offload data infrastructure management and lower TCO with Confluent’s fully managed platform while taking advantage of the elasticity, resilience, and performance of Confluent Cloud powered by the Kora engine.

  • Connect data across heterogeneous operational and analytical systems with pre-built, fully managed source and sink connectors. Bring together disparate data—including GPS/location data, telemetry data from IoT devices, transaction data from stores, web data (cookies, clickstream), cybersecurity log events—ensuring that all data from applications and systems are taken into account. Securely share data with downstream consumers for real-time analytics or building real-time dashboards. Confluent provides a one-click shop where connectors are fully managed, integrated, and working together seamlessly instead of having to stitch everything together. Future-proof your architecture with streaming data pipelines to ensure the right data, in the right format, gets to the right place. 

  • Process data in flight with Apache Flink® to create data products such as alerts, suspicious logins, and risk assessment scores for cybersecurity. Stream processing enables real-time data enrichment to add additional context (e.g., user behavior profiles) to improve the accuracy and relevance of security analysis. Additionally, aggregating log data from disparate sources provides a unified view for comprehensive monitoring. It can also automate workflows that trigger alerts or remediation actions based on predefined rules. The resulting data products can be shared with SIEM systems and other security tools, enhancing their capabilities with real-time data. 

  • Govern data with Stream Governance to ensure compliance and security as well as understand where data is coming from, where it’s going, and if something breaks, why that happened. Schema Registry reduces the risk of data inconsistency and increases data quality by ensuring that all data adheres to agreed-upon schemas. It also simplifies governance by enabling easy tracking of schema changes, maintaining schema evolution history, and ensuring adherence to regulatory requirements.

Solution implementation

Here’s an overview of the deployment architecture for this log aggregation and analytics use case, which uses Confluent Cloud:

(See full-size image)

Data is continuously ingested in real time from heterogeneous sources including login logs, firewall logs, IPS logs, and web logs. A FilePulse source connector can be used to write data to topics in Confluent Cloud. Example topics include Logins, Firewall, IDS, and WebLog. Stream processing with Flink performs different operations to enrich data streams and create ready-to-use data products. In this case, Logins are window aggregated to create a SuspiciousLogins data product while IDS and Logins streams are joined to create custom alerts.

The subsequent data products can be shared downstream via ElasticSearch and Google BigQuery sink connectors. From Elastic, data can be used to create real-time Kibana dashboards while data in BigQuery is used to support AI/ML modeling for incident and anomaly detection. These are a few examples of downstream consumers – many more can be connected with pre-built or custom connectors or there can be microservices that act in real time on anomalous events by closing a port or sending a notification. 

Below are sample Flink SQL queries for login data in this use case. Stream processing is used to detect multiple login attempts from the same source IP as well as find login attempts from source IPs which have been identified as IP source for attacks:

CREATE TABLE logins (
  `login_time` TIMESTAMP(3),
  `username` STRING,
  `source_ip` STRING,
  `status` STRING,
	-- declare login_time as event time attribute and use strictly ascending timestamp watermark strategy
	WATERMARK FOR login_time AS login_time
);

CREATE TABLE ids (
  `event_time` TIMESTAMP(3),
  `source_ip` STRING,
  `destination_ip` STRING,
  `severity` STRING,
  `attack_classification` STRING,
  `action` STRING,
	-- declare event_time as event time attribute and use strictly ascending timestamp watermark strategy
	WATERMARK FOR event_time AS event_time
);

-- Find multiple login attempts from the same source IP
SELECT
	COUNT(*) AS login_attempts,
	source_ip
FROM TABLE(
    TUMBLE(TABLE logins, DESCRIPTOR(login_time), INTERVAL '20' seconds))
GROUP BY source_ip
HAVING COUNT(*) > 5;

-- Find login attempts from source IPs which have been identified as IP source for attacks
SELECT logins.username, logins.source_ip, ids.severity, ids.attack_classification
FROM ids JOIN logins ON ids.source_ip = logins.source_ip;

Conclusion

The use of data streaming in log aggregation and analytics provides substantial benefits for modern enterprises. By enabling real-time data ingestion and stream processing, organizations achieve greater security through faster threat detection and response times. This approach also accelerates time to market for new features, enhancing developer agility by removing operational burden of infrastructure management and making high-quality data readily available and discoverable for use. The elasticity provided by Confluent’s data streaming platform allows businesses of any size to enhance their cybersecurity, scale as needed, and pay only for what they use, optimizing cost efficiency. Embracing data streaming for cybersecurity is a strategic move towards a more secure, efficient, and scalable future. 

To learn more, here are additional resources: 

  • Ohad is a Staff Solutions Engineer at Confluent with deep expertise in enterprise architecture and sales consultancy.

Did you like this blog post? Share it now