An Efficient Remand Risk Assessment Tool based on Machine Learning Techniques
Date
2019-10-28
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ORCID
Type
Thesis
Degree Level
Masters
Abstract
The criminal justice system in Saskatchewan is challenged by the large population of people who are
charged with committing crimes and are waiting to be summoned, the so-called pretrial population. Although
some of these people are released until their trial, others are remanded in custody. The two most common
reasons people are remanded are: (i) probable failure to appear for their trial and (ii) risk to public safety.
A large pretrial population leads to increased expenses for both the government and the defendants. The
pretrial population may be reduced using a remand risk assessment tool (RRAT). The goal of the RRAT
is to lower the number of unnecessary remands by determining which defendants are likely to not appear
or pose a risk to public safety while they are on release. This study uses the Saskatchewan Primary Risk
Assessment (SPRA) as an assessment to measure general recidivism in both male and female adult offenders
under the jurisdiction of the Ministry of Corrections and Policing. The SPRA, comprised of 15,117 offenders
information in the form of 15 questions, is considered as the input to the RRAT.
In this thesis, the use of machine learning models is proposed for the RRAT to predict which defendants
should be remanded, potentially achieving a reduction in pretrial population size. In the first step, to
choose the best machine learning model, several classification models, including the support vector classifier,
decision tree classifier, random forest classifier (RFC), naive Bayesian classifier, and extreme learning classifier
(ELC), are compared in terms of classification performance. According to the simulation results, the ELC
outperformed all other models in the comparison considering all existing features followed by the RFC. The
two models of the ELC and the RFC achieved the lowest false positive rate and the highest accuracy, precision,
specificity, and area under the curve compared to the other explored models. In the second step, to identify
the best features from the SPRA, the ELC is used in conjunction with binary particle swarm optimization
(BPSO) and the result is compared to the RFC. The ELC-BPSO has shown high superiority to increase
the accuracy of the ELC model by using only seven features of the SPRA data. The ELC-BPSO is able to
achieve an accuracy of around 74% using the SPRA data.
Description
Keywords
Machine Learning, Random Forests, Support Vector Machines, Naive Bayes, Extreme Learning Machines, Decision Trees
Citation
Degree
Master of Science (M.Sc.)
Department
Computer Science
Program
Computer Science