[Webinar] How to Protect Sensitive Data with CSFLE | Register Today

Building Trust in AI Means Building Trust in Data

Written By

The Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence (AI) issued in the fall of 2023 looks to establish new standards for AI safety and security, protecting privacy and advancing equity and civil rights, while promoting innovation. The concept of trust is key in this overarching guidance and proved to be a major theme at our Public Sector Summit.

Discussions at the summit looked at the connection between building trust in data and the trustworthiness of AI solutions. As we’ve discussed previously, data is the integral building block for AI. If you look at AI as a pyramid, data forms the base, then people who can use that data are the next layer, then you layer on data engineering tools and techniques, and the final result—the top of the pyramid—is efficient AI. 

Looking at this pyramid of AI, how do we strengthen the data and people layer to ensure AI solutions meet their promise?

Implementing risk frameworks

Today we are largely operating blindly in relation to algorithms. We need to better understand the risk inherent in how algorithms and models are developed and used. With the popularity of generative AI, concern around data bias has become even more pronounced. Even non-technical individuals can readily see and understand the impact of data bias with an AI-generated product.

Risk frameworks are not new but they provide a well-understood approach for how to mitigate issues such as data bias in AI. They typically involve aspects of identifying risks, assessing those risks, and mitigating risks based on assessment, monitoring, and review. AI systems are not static systems and are constantly in motion, processing data and producing results. Therefore, every aspect of the risk framework should also be continuously in motion. For this reason, data streaming, which is already a foundational technology in most AI systems, has a uniquely important role to play. 

When applied appropriately, data streaming delivers data to the models. This data is contained in an immutable log that allows you to monitor and assess risk in real time, historically, and throughout the individual steps of the process. For example, an organization could use an event-driven architecture to examine customer interactions with a chatbot. If the chatbot consistently provides biased responses to certain groups or categories of prompts, the organization could quickly identify the issue and take mitigating action to address the chatbot's bias. 

Data owners vs. data stewards

Speakers at our Public Sector Summit talked about the need to put more emphasis on data stewards, rather than just data owners, as a way to open up trusted access to data across organizations. Data stewards help to make data more discoverable and accessible and protect the data to ensure it gets used securely and effectively. Oftentimes, data owners may be more focused on protecting themselves and their organizations by keeping control of the data. 

While organizations often have people whose job it is to be a data steward, to open data up for AI, or any data-centric goal, more people need to see themselves as stewards of the data, willing to share it where it can make a real impact rather than hold it close so that no one can “mess with it.” Elizabeth Puchek, CDO of the U.S. Citizenship and Immigration Services pointed out that, “we are all data professionals.” The essence is that we all need to look at how to help our mission advance by being good stewards of the data.

One of the biggest fears of giving up “control” of data has to do with the fact that data without context can be misinterpreted and misused. It is critical that data is not taken out of context and that related data can be viewed together no matter how or where it is used. A data streaming approach ensures that the producer can not mutate historical data opaquely to a consumer, they can only update and provide new facts. 

Without clean, trustworthy, up-to-date data, AI is useless. A data streaming approach builds a pipeline for the continual flow of data. Data streaming helps with the shift to stewardship because data is provided through a secure immutable log of events. This offers an evidentiary audit trail of how any model built on top of the data was constructed. Likewise, data streaming is often used to record the usage of data to help with provenance and lineage.

Click here to learn more about how data streaming can strengthen your AI implementations. Contact a Confluent expert today to get started.

  • Will LaForest is Field CTO for Confluent. In his current position, LaForest works with customers across a broad spectrum of industries and government, enabling them to realize the benefits of a data in motion architecture with event streaming. He is passionate about data technology innovation and has spent 26 years helping customers wrangle data at massive scale. His technical career spans diverse areas from software engineering, NoSQL, data science, cloud computing, machine learning, and building statistical visualization software but began with code slinging at DARPA as a teenager. LaForest holds degrees in mathematics and physics from the University of Virginia.

Did you like this blog post? Share it now