Is big data a big ethics problem?

0

IDC’s April 2017 white paper Data Age 2025 posits that in 2025 the digital economy will multiply by 10 the volume of data now generated, attaining something like 163 zettabytes. Life-critical data - data captured from devices having a direct impact on human life, for example: autonomous cars, remote monitoring patient devices, power grids / smart grids for electric, gas, drinking water... - will grow from the current 10 percent (of 16 ZB) to 20 percent, meaning more than 32 ZB. And IDC estimates that one-quarter of that 163 ZB , more than 40 ZB, would be treated in real time.

The digital economy introduces other risks associated with information that can be discovered from such volumes, variety and frequency of data – even without using data that is today considered personal data under GDPR.

With AI and machine learning, it is possible to discover and determine personal profiles. This raises many ethical and, eventually, even legal issues, especially if combined with automatic decision mechanisms supported by analytical models and algorithms. These mechanisms could trigger actions in real time and/or post facto(supported on big data lakes) that could affect social and economic justice for a person or entire communities.

So much of this data directly or indirectly may be considered personal data – sensitive, very sensitive – and therefore subject to protection and auditing regulations. However this is not sufficient, as personal data protection laws normally do not cover ethical and moral aspects, although those may underlie or be the spirit of the law. Thus, it is up to organisations to implement moral and ethical codes that encompass the complete life cycle of data, including obtaining, preparation, processing, aggregation, profiling, sharing, retention, archival and destruction.

Ethics risk assessments

At present, most organisations do not have data/information governance platforms, protocols or processes. And those with such governance do not have processes ruled by a code of ethics. Some professional classes are regulated and therefore subject to a code of ethics or conduct. However, for the unregulated professions where such a code exists, it exists only as a set of internal policies that, in the majority of cases, are not really known and understood by the employees.

In my view, organisations have an obligation to implement data and information governance processes, which should include effective management of consent for data collection, manipulation, processing, treatment, use and sharing. And even more important, they should have in place effective ethical governance protocols for AI/ML models.

These should encompass the entire life cycle, including the reasons and criteria definition for model creation (what we want to achieve with the model and why); obtaining data for development and training; pre-production testing methodologies; and incorporating feedback from application and execution. Of course, the data scientist team that develops them is key in ensuring that ethical principles are respected – stay tuned, as I’ll be addressing this matter in my next article.

Thus, ethics risk assessments should be implemented beginning with the idea, concept and process-design phase for analytical models and artificial intelligence. They should continue through production, in the form of a set of questions for scrutiny and validation that the ethical principles are fully respected. The ethics risk assessment should include effective monitoring, logging and traceability for all phases, including AI/ML model execution, incorporating execution feedback and ensuring complete auditability.

The ethics risk assessments are a must and highly critical for using big data, but much more critical when decisions are taken on the edge and in real time. #AI #analytics #dataprotection Click To Tweet

The ethics risk assessments are a must and highly critical for using big data, but much more critical when decisions are taken on the edge (near where the event happens) and in real time.

You might be interested in watching this webinar to learn about ‘Progress Data Governance for emerging technologies’.

My colleague Olivier Penel writes also about data privacy on his blog ‘Will Privacy Kill Innovation?

Share

About Author

Joao Oliveira

Information Management is part of João’s professional DNA for more than twelve years. He has been driving Data Management initiatives supporting and involved in multiple projects EMEA wide, across multiple industries. João is a very experienced professional with a strong business, functional knowledge including end-to-end solution architecture. His experience and knowledge has been leveraged on multiple engagements supporting organisations’ Information Management (from data capture till data archiving) efforts to cope with legal requirements – e.g. GDPR -, improving customer experience, drive digital transformation and get the most business value from data, either at-rest or in-motion. João has a degree in Applied Mathematics being a spearhead in matters related to Artificial Intelligence, Machine Learning and Advanced Analytics.

Leave A Reply

Back to Top