"Now what? Responsible artificial intelligence (AI)? You're probably going to tell me that this is going to interfere with how I go about AI."
Yes, I am. There's a whole list of nations coming up with strategies on AI. Check out this world map where each colored country is having a go at AI frameworks. And yes – some of these want to tell you what you can and cannot do when working with AI.
Why are regulations popping up? Well, it's not because we can that we should. The scientists from Jurassic Park will agree with me: It's not because you can clone a dinosaur with his DNA that you should. Similarly, it is not because you can create an AI model that diagnoses cancer that you should.
This dummies guide aims at demystifying the whole concept of responsible AI and offers the tools to straighten us out. The basis of this dummies guide is a little secret: There's a common denominator to all these AI frameworks. This nicely translates into a checklist for responsible AI:
- Privacy.
- Bias.
- Explainability.
- Governance.
Yes, a checklist. Let's go over the items one by one.
Privacy ✔
Simple example: Data on disease trends should not include any info that identifies individual patients. So before you even start, make sure your data is anonymized.
How?
Scrutinize your data sets for sensitive info. You can either do this yourself with a battery of data stewards, or you can have some assistance from smart software that flags all the fields you should be concerned about. Life is beautiful if the software also allows you to anonymize the columns you consider to be sensitive.
In the above, the software will scrutinize your data sets for sensitive info and surface them to your attention. I know – you don't need any software to tell you what fields are sensitive. But you also don't need software to spell correctly. You still use the spell checker.
Bias ✔
Another simple example: if you want to sell your drugs in Japan, make sure you include enough Japanese in your clinical trials. Different races have different genes and different reactions to drugs. You can't claim your drug works on Hispanic children if you only tested it on elderly Inuit.
How?
In any case, you need to sit down and think of who the audience is you are targeting with your AI. Are you trying to cure prostate cancer? Then yes, it makes sense to limit your data set to men. If you are trying to find a cure for migraine, then make sure you have enough men and women in your data. Two things to do:
(1) Before you even start, check the data set for honest distributions. Many software vendors will have the capability to pull up a profile for every column of your data set. These allow you to quickly check if the data you have is well distributed over the different segments of your audience. In the figure below, you have a slight overrepresentation of men:
(2) Once you've created the model, make sure it's not biased for the different segments within your audience. Are your predictions as good for teenagers as for the elderly? This can be a very cumbersome process to go through. If you're lucky, your software has a "fairness" button:
In the example above, a company created an attrition model that predicts how likely it is that someone would resign and leave the company. We see it works well for all types of jobs: salespeople, office workers, etc. The model scores around the 80% mark consistently, so it is a fair and unbiased model with respect to the different job roles.
Explainability ✔
Explainability has two facets.
(1) You are expected to explain how parameters influence the output of your model. Most software will allow you to generate that list fairly easily:
In the example on the left, an AI model is used to read narratives in electronic health records and predict whether a cancer patient is likely to suffer an adverse event, ranging from headaches to serious complications with death as a result. The model states that the most predictive parameter for adverse effects is the number of medications the patient is on, followed by tumor-specific parameters like histological stage or tumor grade. Age bracket and race do not seem to play a role in this model.
(2) You are also expected to explain why a model took a particular decision. Imagine your AI model diagnoses someone with cancer based on his CT scan. Explain it. Have the software color the tumor. It makes the model objective, and it also allows the oncologist to challenge the model if he thinks it messed up.
In the example above, the model has found three potential lesions (orange boxes) from the slice. The selected lesion area and its explanation are shown above. The red pixels in the explanation highlight the area that contributes positively to the lesion prediction according to the model. The green pixels highlight the area that detracts from the model’s predicted probability of the lesion (courtesy of Xin J. Hunt, Ralph Abbey, Ricky Tharrington, Joost Huiskens and from Amsterdam University Medical Centers: Nina Wesdorp, MD).
"An AI-Augmented Lesion Detection FrameWork For Liver Metastases With Model Interpretability"
Governance ✔
Governance has no less than four aspects to it.
(1) Always make sure you know which data your model is based upon. If your model goes sour, you'll want to know why that happened, and the ability to dig back into the original data is often a cornerstone of such forensics. "Data lineage," as this is often called, holds the same value as a black box in an air crash investigation.
(2) The other way around: Make sure you know which data is used in which model. This way, if your data ever gets compromised, you know immediately which models are affected.
How?
The first two aspects of governance can be reached through "exploring lineage," a button in the more advanced AI software packages.
One of these boxes in the spiderweb is the data set, and another one of these boxes is your model. Click on the model and trace back to the data it ingested. Or the other way around: Click on the data set and check which models were based on it.
(3) Model governance. The US Food and Drug Administration (FDA) made a really good graph about this and titled it "Good Machine Learning Practices" or GMLP. This was in the context of AI used for medical devices.
Long story short: Focus on the grey box called "model monitoring." It asks you to keep an eye on your model once you're using it and discard it when it starts acting iffy.
How?
Let's stick to our example that predicts adverse effects based on the number of medicines the patient's on and the tumor characteristics. If your model predicts that the patient is likely to be readmitted to hospital and he is, then your model works fine. If he doesn't suffer any adverse effect, then your model fails. Keeping track of how often your model fails is not only a regulatory obligation but also common sense.
In the above example, model performance is 82.07%. It means your AI model was right in 82.07% of the cases.
(4) Sure, technically you can have your AI update itself without any human intervention. But it's not because we can that we should. You don't want an armed drone to teach itself what to shoot at. Therefore, several countries require "human oversight" for such high-risk models.
How?
Set up workflows. You can make these as complicated as you want, but in the end, one thing is important: Before you release a new model into the wild, have a wise colleague look at it.
Am I done now?
Yes. Assuming you have the whole thing well documented. "Well documented" means: (1) make sure you have a process in place that kicks in before you start designing your model. (2) Keep proof of all your checks. If you are using an integrated software platform, your documentation will have been generated automatically, and you are ready for your next audit.
Can't be bothered to look into this. Can you do a gap analysis for me?
It can be arranged. Give the author a shout and he'll put you in touch.