I see the term resilience in a lot of business literature these days. Intuitively, it makes sense. After a pandemic, global supply chain disruptions and resulting economic fragility, executives understandably consider adaptability, durability and how best to operate with a strength of character – all attributes that define resilience.
Many factors contribute to resilience, but data quality is among the most important. Data is the foundation of AI; if the data is bad, the AI will be bad. That's why it's so important to make sure that data is collected, stored and used ethically.
Ethical data collection means being transparent about what data is being collected and how it will be used. It also means getting consent from the people whose data is being collected, storing data securely and protecting it from unauthorized access. Finally, ethical data use respects people's privacy and does not discriminate against them.
When businesses prioritize data quality, they build a foundation for resilience. They are creating a business that can withstand shocks and disruptions and setting themselves up for sustainable success.
It is important to ensure accountability and transparency in the use of AI and that decisions made by AI systems can be audited and explained.
SAS recently conducted a survey to learn more about why resilience has become such an important topic to executives around the globe. The findings, published in Resiliency Rules, reveal something interesting from an AI perspective.
In the survey, we've identified five rules for resiliency, and one of those rules is "equity and responsibility." Somewhat shockingly, the importance of equity and responsibility among over 2,400 senior executives surveyed is more than aspirational. In fact, many cited implementing technology solutions to ensure ethical innovation, and routinely, data quality was the most important technical factor in achieving that end. Not surprisingly, those considered less resilient note costs and data quality as barriers.
Given this focus on equity and responsibility, why are more equitable societal outcomes so elusive? And can AI help improve those outcomes?
The ethics of acquisition, use and disposal of data
Ethical considerations exist throughout the data life cycle, from initial disclosure to final disposal. Data quality is crucial for AI. As the saying goes, "junk in, junk out." However, high- versus low-quality data is often a function of the question being asked. And similar to the data being collected, the questions asked have to be understood as biased, because wherever humans show up, bias exists. Whether that bias diminishes data quality is a function of why the data is collected, from who, and at what time.
It all starts with asking: "For what purpose is the data being collected?" Understanding the intended use of the data is crucial. For example, if you're collecting data about a cohort of patients with a specific condition, are you only collecting data consistent with that condition, or are you also collecting extraneous data that may not be relevant? Holding irrelevant data could lead to unintended consequences if it falls into the wrong hands.
Data quality is the degree to which data is accurate, complete and relevant to the purpose for which it is being used. Confirmation bias is the tendency to search for, interpret, favor and recall information in a way that confirms one's preexisting beliefs or hypotheses. In other words, data quality is about the quality of the data itself, while confirmation bias is how we interpret the data.
Data is the foundation of AI; if the data is bad, the AI will be bad. That's why it's so important to make sure that data is collected, stored and used ethically.
For example, if you are trying to determine whether a social intervention is effective, you would need to collect data on the efficacy of the intervention. This data must be accurate, complete, and relevant to whether the intervention is effective. However, despite high-quality data, you may still be subject to confirmation bias. This is because you may be more likely to notice or remember data that supports your preexisting beliefs about the drug. So what happens when the data tells a story you don't like? Does that mean the data is poor-quality, or are you experiencing confirmation bias? These are important questions to consider, as the answers can lead to AI systems that perpetuate and amplify biases against certain groups of people, which can have serious consequences.
Prioritizing data quality builds resilience
Trustworthy AI requires attention to the data management practices above and proper data preparation, metadata enrichment, labeling and testing data. All are involved in ensuring that AI systems are more accurate and reliable and that they are used responsibly and equitably.
When businesses prioritize data quality, they build a foundation for resilience. They are creating a business that can withstand shocks and disruptions more equitably and responsibly and set itself up for sustainable success.
Here are some specific examples of how businesses can improve data quality:
- Be clear about the purpose of data collection. When you collect data, make sure you have a clear understanding of why you are collecting it and how it will be used. This will help you to collect the right data and to use it ethically and responsibly.
- Get consent from data subjects. Before you collect data from people, make sure you get their consent. This means explaining what you are doing with their data and allowing them to opt-out.
- Secure data. Once you have collected data, it is important to secure it properly. This means using strong passwords and encryption to protect the data from unauthorized access.
- Use data responsibly. When you use data, make sure you do so in a way that is ethical and responsible. This means respecting people's privacy and not discriminating against them.
Data quality and ethical considerations in AI raise questions about accountability. Who is responsible for those decisions when AI systems are used to make decisions that impact people's lives? It is important to ensure accountability and transparency in the use of AI and that decisions made by AI systems can be audited and explained.
Aside from businesses behaving ethically on their own accord, governments worldwide have begun to discuss future AI regulation, so the importance of data quality and ethical considerations when developing and using AI systems is increasing. If the data used to train an AI system is inaccurate or biased, the system will make inaccurate or biased decisions that affect people's lives, yielding personal and economic harm. Perhaps this is why executives cite equity and responsibility as one of their top Resiliency Rules.