Repository logo

Evaluation of machine learning algorithms as predictive tools in road safety analysis



Journal Title

Journal ISSN

Volume Title





Degree Level



The Highway Safety Manual (HSM)’s road safety management process (RSMP) represents the state-of-the-practice procedure that transportation professionals employ to monitor and improve safety on existing roadway sites. RSMP requires the development of safety performance functions (SPFs), which are the key regression tools in the Highway Safety Manual’s RSMP used to predict crash frequency given a set of roadway and traffic factors. Although developing SPFs using traditional regression modeling have been proven to be reliable tools for road safety predictive analytics, some limitations and constraints have been highlighted in the literature, such as the assumption of a probability distribution, selection of a pre-defined functional form, a possible correlation between independent variables, and possible transferability issues. An alternative to traditional regression models as predictive tools is the use of Machine Learning (ML) algorithms. Although ML provides a new modeling technique, it still has made-in assumptions and their performance in collision frequency modeling needs to be studied. This research 1) compares the prediction performance of three well-known ML algorithms, i.e., Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF), to traditional SPFs, 2) conducts sensitivity analysis and compare ML with the functional form of the negative binomial (NB) model as default traditional regression modeling technique, and 3) applies and validates ML algorithms in network screening (hotspot identification), which is the first step in the RSMP. To achieve these objectives, a dataset of urban signalized and unsignalized intersections from two major municipalities in Saskatchewan (Canada) were considered as a case study. The results showed that the ML prediction accuracies are comparable with that of the NB model. Moreover, the sensitivity analysis proved that ML algorithms predictions are mostly affected by changes in traffic volume, rather than other roadway factors. Lastly, the ML-based measure consistency in identifying hotspots appeared to be comparable to SPF-based measures, e.g., the excess (predicted and expected) average crash frequency. Overall, the results of this research support the use of ML as a predictive tool in network screening, which provides transportation practitioners with an alternative modeling approach to identify collision-prone locations where countermeasures aimed at reducing collision frequency at urban intersections can be installed.



crash frequency prediction, machine learning, support vector machine, decision tree, random forest, network screening



Master of Science (M.Sc.)


Civil and Geological Engineering


Civil Engineering


Part Of