[Webinar] Shift Left to Build AI Right: Power Your AI Projects With Real-Time Data | Register Now
Processing a lot of data with Kafka means knowing how and when to scale horizontally and vertically. When you’ve exhausted the boundaries of scaling inside a single cluster, replication becomes critical but sometimes standard replication is not enough.
New Relic once earned the dubious title of “World’s Largest Kafka Cluster”, and in our journey to break this cluster into dozens of smaller clusters, we needed to route events between clusters and topics based on headers.
At the time, this meant we had to do it ourselves. Starting out, our goal was fan out (one-to-many) replication. Since then our needs have expanded to include many-to-one and many-to-many replication.
In this talk we'll discuss what bottlenecks we have hit as we scaled out, and what measures we took to remove them, such as:
At the end of this talk you will understand how we have scaled replication and routing to support New Relic's ever growing data ingestion, and all the mitigations it took to get us there.