Event stream processing, data quality and risk mitigation

In my last post, I introduced the question of ethical scenarios associated with automated systems, particularly ones that rely on event streams and complex event processing such as autonomous vehicles. As developers, we tend to focus on designing a system to be robust in terms of supporting all the presumed use cases of normal operations. But when a system is intended to be deployed in an environment that allows for unpredictable behavior, additional steps must be taken to anticipate how unpredictability affects normal stream processing.

Interestingly, this is less of a technical issue and more an issue of social responsibility. Addressing it requires integrating behavioral rules that are influenced by a number of non-technical factors, such as:

Laws. I suspect that there's a gap in our legal system regarding accountability for automated decision-making. If the self-driving car I am riding in has an accident, who is at fault? Me, the company that manufactured the car, the software developer, or some other party whose data affected my car’s stream processing algorithms? I can’t imagine the general rollout of any kind of mass-produced automated environment without there being a set of laws governing their use.
Ethical guidelines. A body of laws is a start, but that does not address the ethical questions I raised previously, nor more complex ones. For example, consider an autonomous car with two of the owner’s family members as passengers. If there is a situation that pits the lives of the passengers against the lives of the same number of strangers, what choice should be made? What information is required to be able to make that choice?
Risk modeling. Here's another spin on the trolley problem I discussed in the last post. In that problem, the choice was doing nothing and allowing five people to be killed versus pulling a lever that would save the five but would kill another person. What if the choice were seriously injuring the five people, not enough to kill them immediately but enough so that there was a chance that any or all of them might subsequently die? You might suggest that the possibility of survivors would make it easier to allow those five to be hit. But what if you had information about the health status of each of the five people to know that there was a 99.8% chance that two of the five would die within 24 hours after the collision? I am not going to suggest an answer to this question. Rather, I want to point out that access to more and more data about the scenario could contribute to a refinement of the decision-making process. But how much risk profiling and analysis can be integrated to truly make the right decision?
Data quality assurance. Finally, something we might be able to control! Recognize that these automated decision-making stream processing engines balance the appropriately designed models and patterns with the actual data streams that feed the process. Any data flaws increase the risks of incorrect decisions being made. So it would be insane to not fully integrate some types of redundant data verification and validation into each data stream as a mechanism for reducing risks.

What does this really mean?

First, it means that there must be a closer collaboration – between technologists, data scientists and information engineers, and the legal community – to advocate for well-defined guidelines overseeing responsibilities and accountability for unexpected behaviors of automated systems. Second, it means we need to scope what information should or should not be acceptable as inputs for automated event stream processing systems in calculating probabilities of outcomes in emerging situations. Finally, responsible data management professionals must increase their efforts in asserting what quality data means in the context of event stream processing. Taking these steps may help proactively address the potential ethical concerns of stream processing and automated systems.

Download – How Streaming Analytics Enables Real-Time Decisions