Artificial intelligence (AI) offers many opportunities for innovation. It already allows us to improve traffic flows, safely manage large crowds of people at events, analyze automated MRI scans for particular diseases and disorders, and check the effectiveness of treatment. However, new privacy legislation – such as the European General Data Protection Regulation (GDPR) – imposes stricter requirements on the processing of personal data by organizations.
What will be the impact of this new privacy legislation on AI projects? And how can we use AI to create (social) added value while still protecting the privacy of the data subjects involved?
From my network of data privacy professionals, I know that organizations have put a lot of work into GDPR compliance. They have appointed data protection officers, set up records of processing activities, carried out privacy impact assessments and concluded data processing agreements. They have also raised awareness at different levels in the organization.
I also know, however, that organizations are mainly working on their primary processes and operational systems. In many cases, they have not fully considered areas such as advanced/predictive analytics, big data, data lakes, data science and AI.
Companies increasingly provide added value to their (potential) customers by analyzing data using AI. This can be at odds with the more stringent requirements imposed by the GDPR on processing personal data and the interpretation and transparency of the algorithms. How do you ensure adequate and transparent use? And how do you ensure the data subjects do not fall victim to prejudice and discrimination? In the analytics and data science field, this is also called "bias."
How data protection influences bias
Bias can lead to unfair decisions. To a certain extent, we are all biased in our decision making, but organizations manage that by involving more people in important decisions. Nowadays, however, organizations are more likely to use data-driven machine learning techniques to make decisions. So it is essential that they use complete, high-quality training data. Poor quality or incomplete data will simply lead to the self-learning system learning to make the wrong decision.
Bias can also occur as a result of data protection. To guarantee the privacy of individuals, organizations often apply data minimization and store only the required data. However, this makes it more difficult to combat bias. In many cases, it may actually be necessary to store more data to prevent targeted bias. This is an important ethical issue for the use of AI. And it requires strong data governance to ensure that appropriate data is held and used.
Amazon example
Let us consider a simple example that makes this point very neatly. Amazon used to deal with CVs blindly and therefore did not know whether a particular CV was from a man or a woman. You might think this would help to avoid bias, but in fact, something more interesting happened.
Amazon was using software to select potential candidates for interviews and noticed that there were significantly more male interviewees. How was this possible? The company found that the software was biased towards particular hobbies and membership of certain student associations that were more likely to be associated with men. However, this was difficult to detect and prevent because there was no information about whether the CV was from a man or a woman. By knowingly collecting data about gender, Amazon would have been able to prevent this bias.
What will it take to Innovate at Scale? How will the data-driven discovery experience be enabled? Who should drive monetization opportunities? What role will skills and tools play? How will future business models be designed? Be part of our expert study, and discover better practices together with your peers. Express your interest here.
Profiling – ‘useful information about the underlying logic’ and AI interpretability
The GDPR includes several articles on profiling, including the obligation to submit "meaningful information about the logic involved." This is a serious challenge for organizations and governments, especially when using advanced machine/deep learning techniques that behave like a black box.
It is helpful to focus on three practical components to manage this problem:
- Data transparency. What data is used in the algorithm, and what is its quality?
- Model transparency. Which version of an algorithm is used and with which parameters?
- Decision transparency. In what kind of business or system decisions is the model used?
The analytics or modelling platform that you use needs to enable your organization to address all three of these issues. It also helps if it can provide visual explanations. A few colleagues of mine have written some interesting articles about AI interpretability. It will probably go a long way to manage this element of GDPR.
We must continue to innovate
There is no question that data privacy legislation has an impact on AI and innovation. It is, however, important not to let legislative requirements paralyze us. We must continue to innovate. Take a good look at the type of AI project that you want to set up and assess whether you have sufficient legal basis for it. Are there other ways to achieve the same goal that have less impact on the privacy of the individuals?
Amazon used to deal with CVs blindly and therefore did not know whether a particular CV was from a man or a woman. You might think this would help to avoid bias, but in fact, something more interesting happened. Click To TweetIt is also helpful to identify which measures you need to take to adequately monitor your processes and make them transparent. These might include considering the impact of data minimization and privacy by design.
GDPR does impose additional requirements on the processing of personal data, but it remains possible to use smart algorithms to add value for the organization and its customers, provided that you put the right measures in place first, including ethical considerations.