There has been much discussion about the relationship between data science and artificial intelligence. It can become a complicated dance when applied data science is partnered with emerging artificial intelligence technologies. Who takes the lead? How do we keep the beat? Can we make sure neither party steps on the other's toes?
I like to think of data science at least partially as being an application of artificial intelligence. Academics (and to some extent even practitioners) create algorithms while data scientists cull data and apply these algorithms. As these algorithms develop more abilities to learn, machines will become more intelligent.
Learning to dance with this new partner will be a delicate balance of directing the algorithms (through informed feature selection, feedback loops, manual model parameter selection and business rule encoding) and letting them lead (through autotuning, optimization techniques and deep learning). These considerations will undoubtedly grow in importance as data science and automated decisioning expand into every corner of the organization and into our daily lives.
However, one area where we, as data scientists, most definitely need to take the lead is in developing and using ethical frameworks. Computers are getting better at simulating intelligence, but so far, lag in simulating human values.
Serious folks like Stephen Hawking, Bill Gates and Elon Musk are investing considerable time and energy in spreading the word about the perceived threat that unconstrained (eg: without ethical frameworks) AI presents. As chief artificial intelligence practitioners, data scientists need to take the lead in setting up ethical frameworks which will keep AI on the right track.
Let’s start with a reasonable assumption: by the end of the century, machines will be a great deal more self sufficient (indeed, self-driving cars, target seeking military drones and smart coffee machines are already scientific FACT). Self-sufficient in this sense means self-learning and self-modifying, and eventually identifying and acting in contexts for which the machine was not originally designed. If you extrapolate this progression of self sufficiency to a natural conclusion, it could also mean that not even the hard-coding of back doors in such systems would be enough to completely eliminate the chance of undesirable machine behaviors.
If we assume that the best-planned coding or programming might not be enough to ensure human favorable outcomes (for now, we’ll stay away from the very tricky discussion of WHAT precisely IS a human favorable outcome), we should now be discussing and implementing a combination of ethical frameworks and even AI behavioral training.
And this, for me, is where the real brain-stretching starts. What precisely would AI behavioral and morality training look like?
Dan Curtis proposed an eventual need for Artificial Psychology as early as 1963, but realized the current state of technology was not yet ready for it. Even today, the Wikipedia entry on Artificial Psychology ends with the sentence, "As of 2015, the level of artificial intelligence does not approach any threshold where any of the theories or principles of artificial psychology can even be tested, and therefore, artificial psychology remains a largely theoretical discipline."
But is that entirely so? Can we not already start today with programmatic approaches that take ethics into account? After that do we need to create virtual environments where AI machines can learn, including rewards for correct moral choices?
Follow ethical guidelines
We should be applying ethical vigor to current machine learning processes. How does that apply to our daily work as data scientists? Well, Google tries a simple mantra, "Do no harm." Likewise, any business contexts that go against our own personal credos should be vetoed.
How do we as data scientists ensure that we are training ethically responsible, strategically aligned models that will generate good behaviors from our programs? Are there existing techniques for this? Do we need to develop new ones? How far can data SCIENCE take us before we need to establish and encode a data ETHOS?
These are important questions and we need to begin sensibly and concretely.
For example, when training predictive models, is it enough to simply use historical cases to build a model? Or, is it up to us to consider morality. What is the model behavior we want to encourage? What types of customer or other entity behaviors do we want to reward? Do we stress the rewards of certain outcomes? This is where human rules complement machine learning when we move to deploy. And this isn’t just some abstract morality question. It’s also a question of making models behave in line with organizational strategic objectives AND ethics.
Following this logic, an essential part of effective model management really becomes outcomes assessment, not just in terms of accuracy, but are we ethically in line and producing ethical outcomes.
We have a realistic chance to teach desired behaviors in our predictive models. In this age of analytics sandboxes and playpens, what kind of structures can we envision? How about experimental approaches? Online testing, A/B or one-armed bandit? Constructed cases?
If a model’s purpose is to select the right customers, how do we build promotion and recognition concepts into models transparently? Surely certain algorithms do apply these concepts mathematically, which is a good thing, but is it enough? A great short book about hyperparameter tuning and other model testing considerations is, Evaluating Machine Learning Models: A Beginner’s Guide to Key Concepts and Pitfalls, from O’Reilly press (while not written from an ethics perspective, anyone is free to read it with ethics in mind).
These are tough issues for anyone, but especially for data scientists. Science looks to eliminate bias but aren’t ethics and morality really a type of (necessary?) bias? A next (evolutionary?) step can of course be that AI will discover better behaviors or even optimal morals? At what point should we feel safe in subjecting ourselves to such a paradigm?
I would be very interested hearing your views on these topics.
And get in on the dance...