Is big data a big ethics problem?


IDC’s April 2017 white paper Data Age 2025 posits that in 2025 the digital economy will multiply by 10 the volume of data now generated, attaining something like 163 zettabytes. Life-critical data - data captured from devices having a direct impact on human life, for example: autonomous cars, remote monitoring patient devices, power grids / smart grids for electric, gas, drinking water... - will grow from the current 10 percent (of 16 ZB) to 20 percent, meaning more than 32 ZB. And IDC estimates that one-quarter of that 163 ZB , more than 40 ZB, would be treated in real time.

The digital economy introduces other risks associated with information that can be discovered from such volumes, variety and frequency of data – even without using data that is today considered personal data under GDPR.

With AI and machine learning, it is possible to discover and determine personal profiles. This raises many ethical and, eventually, even legal issues, especially if combined with automatic decision mechanisms supported by analytical models and algorithms. These mechanisms could trigger actions in real time and/or post facto(supported on big data lakes) that could affect social and economic justice for a person or entire communities.

So much of this data directly or indirectly may be considered personal data – sensitive, very sensitive – and therefore subject to protection and auditing regulations. However this is not sufficient, as personal data protection laws normally do not cover ethical and moral aspects, although those may underlie or be the spirit of the law. Thus, it is up to organisations to implement moral and ethical codes that encompass the complete life cycle of data, including obtaining, preparation, processing, aggregation, profiling, sharing, retention, archival and destruction.

Ethics risk assessments

At present, most organisations do not have data/information governance platforms, protocols or processes. And those with such governance do not have processes ruled by a code of ethics. Some professional classes are regulated and therefore subject to a code of ethics or conduct. However, for the unregulated professions where such a code exists, it exists only as a set of internal policies that, in the majority of cases, are not really known and understood by the employees.

In my view, organisations have an obligation to implement data and information governance processes, which should include effective management of consent for data collection, manipulation, processing, treatment, use and sharing. And even more important, they should have in place effective ethical governance protocols for AI/ML models.

These should encompass the entire life cycle, including the reasons and criteria definition for model creation (what we want to achieve with the model and why); obtaining data for development and training; pre-production testing methodologies; and incorporating feedback from application and execution. Of course, the data scientist team that develops them is key in ensuring that ethical principles are respected – stay tuned, as I’ll be addressing this matter in my next article.

Thus, ethics risk assessments should be implemented beginning with the idea, concept and process-design phase for analytical models and artificial intelligence. They should continue through production, in the form of a set of questions for scrutiny and validation that the ethical principles are fully respected. The ethics risk assessment should include effective monitoring, logging and traceability for all phases, including AI/ML model execution, incorporating execution feedback and ensuring complete auditability.

The ethics risk assessments are a must and highly critical for using big data, but much more critical when decisions are taken on the edge and in real time. #AI #analytics #dataprotection Click To Tweet

The ethics risk assessments are a must and highly critical for using big data, but much more critical when decisions are taken on the edge (near where the event happens) and in real time.

You might be interested in watching this webinar to learn about ‘Progress Data Governance for emerging technologies’.

My colleague Olivier Penel writes also about data privacy on his blog ‘Will Privacy Kill Innovation?


About Author

Joao Oliveira

Pricipal Business Manager

Joao Oliveira specialises in artificial intelligence, machine learning and advanced analytics. He advises organisations how to design architectures and processes to support end-to-end information management projects that meet legal requirements and drive digital transformation to get the best business value from data, decisioning and analytics. His particular focus is helping teams involved in automating data management, data governance and data quality tasks applying Intelligent Decisioning and how they can add AI and advanced analytics to get better value, for example on IoT and AI on the Edge.

Leave A Reply

Back to Top