Analytics offers huge potential to transform raw data into operational intelligence. It provides new insights into old or new problems. This is crucial in business – but it is even more important for tackling big societal issues, such as domestic violence or human trafficking.
I recently had the opportunity to do some work with the VioGen project, a domestic violence initiative run by the Spanish Ministry of the Interior. Globally, it is estimated that one in three women will experience some form of gender-based violence in their lifetimes. Worse, a study found that one in five boys believes that gender violence doesn’t exist. This is a huge issue, and there is no silver bullet.
The Spanish situation
Last year, in Spain alone, there were more than 29,000 "open" cases of domestic violence. Sadly, 45 of those cases ended in a woman being killed. However, the police and other agencies in Spain have been taking action to reduce those figures over the last 14 years through the VioGen project, which aims to prevent repeat cases. Although it is almost impossible to predict the first offence.
This project works with law enforcement and judicial units to manage, assign, protect and monitor victims of gender-based violence. Until recently, it has operated on a combination of human behavioural science, policing experience and IT to try to predict the likelihood of repeat offences and protect women. The unit assesses the risk based on all the information available, changing the level when necessary in the light of new information –for example, when a perpetrator is released from prison.
We believed that machine learning might be able to help improve the risk assessment process. So we worked with the gender violence team to answer the question: How can machine learning help to combat domestic violence?
Assessing risk, providing support
The challenge was to improve the existing system to ensure every woman is given the correct risk factor and receives the correct support and protection. We used data from the last few months of 2016 to allow us to assess the outcomes against the information available to the police and other agencies. The data included information from the victims, complaints made and the different risk assessments, both at the time of registration of the case and during its evolution.
The primary objective was to see how analytics could help improve police risk assessments. In particular, we wanted to know if it was possible to predict more precisely when and in which cases there would be a repeat offence. The secondary objective was to see if an analytics platform could distinguish cases where some sort of reoffending is likely, and also predict when it will occur and to what extent.
Over time, the VioGen team has compiled a series of indicators of the psychological profile of perpetrators and the vulnerability of victims. This includes information such as suicide attempts, addictions and family history of the perpetrator, giving a total of more than 50 indicators. This provides a huge pool of data to use for modelling purposes.
A 2-stage strategy
After initial data processing, we opted for a two-stage modelling strategy to manage the different levels of information received. Every case has several complaints, and each complaint can include several facts. In the first stage, we used the complaints and information from police files about the victim to try to classify the probability of recidivism.
This allowed us to exploit the experience and professionalism of the police officers in their reports. It gave a predictive model that assigned a probability of recidivism using just these reports. During the second stage, we developed an analytical model that used the probability generated in the first stage and added the indicators about the perpetrator to assign a final probability of reoffending.
We found that certain variables were given more weight, in addition to the information from the first phase model. These were variables such as the criminal record of the aggressor, the number and type of threats and the level of lack of respect for authority shown.
Overall, we found that a machine learning model provided actionable intelligence, which we hope will help to reduce the risk of reoffending. There does not seem to be any way to prevent the first incident of gender-based violence, short of huge societal changes. However, the effective use of predictive analytics and machine learning can almost certainly help to prevent subsequent cases.
If you are a data scientist, you can contribute by joining the workshop on Nov. 17. Click here to learn more.