A Machine Learning Generalization of LSI-OR
The Level of Service Inventory-Ontario Revision (LSI-OR) is used as a risk/need assessment tool to classify, manage, and treat the offender population so that they receive supportive services consistent with their custodial needs. This thesis adopts a machine learning approach employing the Naive Bayes technique as an alternative to the LSI-OR. The study was conducted on a group of (72725) offenders with different races and includes males (82.62%) and females (17.38%). Participants were monitored for two years to collect recidivism information. A basic analysis of the dataset revealed that 1) 83.18% of population used a unique pattern to answer 43 LSI-OR items, 2) the total LSI-OR scores in the entire population and also in male and female population followed two beta distribution functions, one for each recidivism class, and 3) the recidivism rate was approximated by a normal distribution function. It was shown that the Naive Bayes classifier can be considered as an extended LSI-OR classifier that accepts multiple continuous and discrete features as input. In other words, the Naive Bayes classifier provides a simple framework for studying the effect of distinct features on classification efficiency and accuracy. The results of running the Naive Bayes classifier with various input features revealed that the Naive Bayes classifier presented better performance than the LSI-OR. However, there was no obvious trend in the accuracies predicted by both models to indicate the superiority of one model over the other. The only feature whose value could be treated as a continuous variable was the LSI-OR score. Many models were created based on continuous and discrete LSI-OR scores producing either the same performance and mean accuracy or slightly better. The dataset contained many features that are never used by the LSI-OR assessment for instance, the offence severity. A model was built at each index of offence severity based on LSI-OR scores and 43 LSI-OR items as input features. The results of running the experiment indicate that considering 43 LSI-OR items gives more stable results in terms of accuracy than the LSI-OR scores.
LSI-OR, Naive Bayes, machine learning, level of service inventory, classifier, algorithm
Master of Science (M.Sc.)