[Webinar] Bringing Flink to On-Prem and Private Clouds | Register Now

McAfee Moves from Monolith to Microservices on Confluent Cloud

To reliably prevent malware threats and phishing scams, get privacy and identity protection for your digital presence, and prevent your data from being compromised, who do you turn to for solutions? Probably McAfee.

The worldwide leader in online protection provides a holistic suite of threat prevention, privacy, and identity products through direct consumer channels as well as via global integrated service provider (ISP) and telco partners. McAfee has millions of endpoints that provide business insights, such as automatically alerting a customer to a breach and walking them through the corrective action.

But as cybercrime has gotten exponentially more complex, so has McAfee’s data landscape—specifically, their reliance on real-time data to meet modern business demands. Which is why Confluent has become a critical tool. 

Before McAfee selected Confluent as their de facto data streaming technology, their IT teams had been tasked with managing Apache Kafka® and various data streaming technologies. But McAfee was on a cloud-native modernization journey that required building out a distributed microservices ecosystem. With requirements constantly changing, and with the need to focus on mission-critical product development rather than managing infrastructure, the move from open source Kafka to Confluent was necessary.

We spoke with McAfee IT decision-makers: VP of Platforms Mahesh Tyagarajan and Cloud Architecture Leader, Consumer Platform Rupin Kakkar. In this post, we share what they’ve learned in moving McAfee’s legacy systems to an event-driven microservices architecture to add scalability, performance, and functionality.

Simplifying a complex status quo 

Before Tyagarajan joined McAfee, he led platform transformation and tech modernization at organizations like Walmart and Kroger. With his experience, he was brought into McAfee to transform an inherited monolithic architecture into decoupled systems with the ability to rapidly scale in a microservices world. 

McAfee was already using multiple data streaming technologies—open source Kafka, Amazon Kinesis, and Azure Event Hub—but the complexity and self-management aspect created many challenges, the main ones being:

  • Manageability and operations in areas such as security, patching, and upgrading Kafka and ZooKeeper nodes, along with schema management

  • A lack of ability to scale infrastructure in an agile way

  • The need for dedicated FTEs with Kafka expertise

“There were also various flavors of messaging being used,” says Tyagarajan, “and our charter was to try to standardize.”

As McAfee adopted a microservices architecture, the number of events and consumers went way, way up. From his experiences in the enterprise retail world, Tyagarajan recognized that it was time to look beyond open source, self-managed Kafka and find the right data streaming technology partner that could enable an ease of manageability and an ability to upgrade and unlock new real-time use cases. 

The value in standardizing data streaming across the organization

McAfee’s IT organization was already well-versed in using Kafka, but individual teams operated in their own ways, using whatever technology solutions they felt best served them. With open source Kafka, they often had two or three different clusters running at a time, and there were constant operational issues. Scaling up and down was a major challenge, and upgrades, management, and security around an open source cluster were all pain points. 

As Kakkar describes, “It wasn’t serving us well for a number of reasons. If an operational engineer had to add more partitions to an existing fabric, running and managing partition rebalances would become a nightmare.” He continues: “When we wanted to standardize on Kafka for a lot of its needed capabilities, we ran into issues with low latency, fan-out capabilities, and so on. We needed to rebuild on a foundation that would allow us to stay resilient, highly performant, and elastic.” 

McAfee wanted to move toward a standardized, centralized nervous system that wouldn’t rely on open source Kafka (but would still have a foundation in Kafka), and Confluent was the ideal solution. Of the microservices being used, Kakkar says, “By bringing them all to the standardized platform, it provided us with the engineering velocity to deliver a variety of use cases rather quickly. And that's one of the key business drivers that we have as an engineering organization, because historically, it’s been one of the pain points for our business—to not be able to experiment and roll out products and features fast enough.”

A few of the real-time use cases Confluent has enabled

For both Tyagarajan and Kakkar, using Confluent Cloud to build streaming data pipelines enabled a wide range of use cases within the organization, including:

  • Decoupling systems and creating domain-specific isolation along with an orchestration tier of consumers

  • Streaming to analytical systems (i.e., a data lake) as well as to financial reporting systems 

  • Accessing real-time business metrics such as conversions and activation from free to paid—with financial reporting and analytics systems in the mix

  • Ensuring operational stability of the system even with hundreds of millions of device endpoints and high-volume data ingestion 

  • Cloud telemetry and edge telemetry within an ecosystem of distributed event-driven microservices, better enable end-user services—events created from new customer subscriptions, for example, trigger notifications and customer communications, which are then fed into their data analytics system

  • The ability to manage hundreds of millions of devices, from iOS to Android, Mac to PC desktop, and so on

As Kakkar explains, the IT decision-makers at McAfee “really like to take a platform approach to our technology stack, which means that instead of building bespoke solutions into different parts of the organization, we like to bring platforms to the table which essentially work as a service within the company. Confluent is a great example of that.”

Conscious decoupling in the modern microservices world

Today, McAfee has around 600 microservices, and Kakkar says, “With Confluent Cloud supporting the scale and speed at which we are going—and some of the new products that we are working on bringing to the market—we are going to get close to 1,200 microservices this year.”

With so much in the mix, keeping domains separate and services decoupled will be essential—as will the ability to generate and share reports and notifications with consumers and partners. “In a microservices world, we are a very decoupled system,” explains Tyagarajan. “We don’t let domains call each other.” But even with such domain-specific isolation, events from one domain are often consumed in another. So, he continues, “There’s a consumer orchestration tier on top to bridge the domains that shouldn’t talk to each other—for instance, the account and user domain, the identity domain, and the subscription domain.”

The way that Confluent works for McAfee today is sophisticated and seamless. Various systems within the architectural stack produce events, which are then consumed by downstream systems through streaming data pipelines that allow systems to stay decoupled but still communicate data in real time. McAfee also uses Confluent’s Connectors and Schema Registry, and has set up for real-time event ingestion from Confluent into Druid.

The benefits of Confluent across McAfee

Technology-wise, the benefits Confluent brings include: 

  • The ability to decouple systems to ensure scalability in a microservices world

  • A way to deploy data pipelines efficiently without one domain impacting the availability of another

  • Ensured security, high resiliency, and availability with 99.99% uptime—just a better technical operation all around

  • Future extensibility—multiple use cases feeding off of the same data pipelines enables new business use cases (and velocity)

Translated into business metrics, there are even more tangible benefits:

  • A massive reduction in the ops burden and infrastructure spend

  • Reallocation of full-time employees so people aren’t managing Kafka for the bulk of their time, and can instead focus on development of new projects, apps, and customer experiences

Tyagarajan believes platforms should always possess the ability to improve without applications being forced to change, and it's this flexibility that makes Confluent so valuable for McAfee. The fact that Confluent is cloud-agnostic is also really important to the company, giving them limitless future flexibility.

Planning for the future of data streaming

The ability to take advantage of Kafka for streaming data, while still protecting consumer data, has helped McAfee grow as quickly as its audience does. But it’s the move to Confluent Cloud that has truly enabled the company to flourish in this digital age.

Tyagarajan’s advice to IT leaders on the move to Confluent is to make a real commitment with both feet in. But you don’t have to commit to every single use case at once. Start with one, but know the rest: “Think about what you want to implement later on. It allows you to prepare for how you’re going to organize and the level of scale, manageability, tenancy, and data sizing you’ll need. As they say, one hour in the library substitutes a week in the lab.”

Want to learn more about how data streaming can help drive your business forward? 

Did you like this blog post? Share it now