GDPR is affecting model deployment - how to deal with it

0

Model deployment is a crucial part of the model life cycle. Having developed and validated your model, you need to be able to put it to work. This “putting to work” stage is called deployment, and it is far from trivial. At this stage, the model ceases to be “training” or using test data. Instead, it starts to draw on new data, and use it to generate predictions that will support decisions, applying all the logic from the training phase.

Many projects and models fail at this stage. This can be because it is hard to scale up, but more often it is because building in all the data dependencies is challenging. It can be next to impossible if the data scientists who built the model are not involved. These original data scientist(s) are essential at this stage to ensure that the model works in the planned way, and there are no unintended consequences.

Important implications of model deployment

This is important, because a failure to deploy means wasted effort on development and validation. But it also has implications now that the General Data Protection Regulation (GDPR) is just around the corner.

GDPR will affect how information is stored and used. It will also have a major impact on how analytics systems – including models and algorithms – can be used to make decisions. The regulation will create a “right to explanation.” Customers will be able to ask organisations to explain the basis for a decision that was made, especially if it has a legal or other significant effect. This will apply to decisions about credit, including for mortgages.

What’s more, customers must be given “meaningful” information in response to that request. They have a right to know why they were refused, and on precisely what basis. Effectively, this means that organisations will need to be able to understand and explain algorithmic decisions. They must therefore be clear about the logic behind the model – in other words, which features play the largest roles in prediction, and how different elements or features interact in the decision process.

This creates a challenge anytime that a model is deployed or used to support decision making. It means that model deployment processes, and by extension, the data scientists involved, need to create frameworks that enable these explanations. This, by any standard, is going to prove difficult for industry.

Hidden Insights: GDPR is affecting model deployment - how to deal with it

A technique to comply with the GDPR: building in rules

One way to respond to the demand for “meaningful information” is to build a system that enables you to show the customer the impact of the most important variables. For example, you may want to show which values in which variables were too low or too high to allow the desired decision. This can be done by building in some very clear business rules. The model will follow these, and you will be able to explain its decisions by showing the customer these rules.

In practice, you can use a series of steps to do this:

  1. Build your complex machine learning model, including training it with a set of test data.
  2. Use the output from the model as a new target variable.
  3. Subject the new variable to the credit scoring logic from a data mining tool, such as SAS® Enterprise Miner™.
  4. Use this information to create a new, surrogate model that mimics your original machine learning model.
  5. Use this new model to generate the set of “adverse characteristics,” or characteristics that would result in an “adverse decision.”
  6. Use this set of rules to enable you to satisfy the customer’s right to an explanation.

This may sound like a long process, and many organisations may be tempted by shortcuts. However, this technique will enable organisations to comply with the GDPR, and ensure that they genuinely can provide “meaningful information” to their customers about the decisions made by the model.

What happens to self-learning algorithms?

The “right to an explanation” may mean that self-learning algorithms cannot be used for important and/or significant decisions, including about access to credit. However, it does not mean that models cannot be used at all. Sensible process building will ensure that the information can be drawn out when necessary.

However, just like building in data dependencies, this does not happen by itself. It needs a conscious effort at the model deployment stage to ensure that these processes are enabled. Data scientists have a key role to play in this, as they do in any work to ensure that decision-support models work as intended.

Tags
Share

About Author

Mathias Lanner

Principal Advisor in the Nordic Technology Practice

Mathias Lanner has over 20 years’ practical experience in creating valuable customer insights from data through different kind of advanced analytics including supervised - and unsupervised modelling. Today Mathias is working as a Presale in the Nordic Technology Practice where he supports the Nordic Industry Sales teams with his experience in analytics.

Leave A Reply

Back to Top