Auditing your data and answering the life long question, is it the end of the day yet?

« Kafka Summit London 2022

Over here at Nielsen, data is very important to us. Being the core of our business, we love it and there’s lots of it. We don’t want to lose it, and at the same time, we don’t want to duplicate it. Our data goes through a robust Kafka architecture, into several ETLs, receiving, transforming and storing the data. While we clearly understood our ETLs’ workflow, we had no visibility into what parts of the data, if any, were lost or duplicated, and in which stage or stages of the workflow, from source to destination.

But how much do we know about the way our data makes though our systems? And what about the life long question, is it the end of the day yet?

In this talk I’m going to present to you the design process behind our Data Auditing system, Life Line. From tracking and producing , to analysing and storing auditing information, using technologies such as Kafka, Avro, Spark, Lambda functions and complex SQL queries. We’re going to cover:

AVRO Audit header
Auditing heart beat - designing your metadata
Designing and optimising your auditing table - what does this data look like anyway?
Creating an alert based monitoring system
Answering the most important question of all - is it the end of the day yet?

Presenter

Simona Meriam

Aidoc

Simona Meriam is a Senior Data Engineer at Aidoc, where she specializes in research and development of solutions for big data infrastructures. In her previous position as a Big Data Engineer at Nielsen, she researched and developed big data solutions using cutting-edge technologies such as Spark, Kafka, and Elasticsearch.

Auditing your data and answering the life long question, is it the end of the day yet?

Presenter

Simona Meriam

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how