Machine Learning in Population Health: Frequent Emergency Department Utilization Pattern Identification and Prediction
Emergency Department (ED) overcrowding is an emerging risk to patient safety and may significantly affect chronically ill people. For instance, overcrowding in an ED may cause delays in patient transportation or revenue loss for hospitals due to hospital diversion. Frequent users with avoidable visits play a significant role in imposing such challenges to ED settings. Non-urgent or "avoidable" ED use induces overcrowding and cost increases due to unnecessary tests and treatment. It is, therefore, valuable to understand the pattern of the ED visits among a population and prospectively identify ED frequent users, to provide stratified care management and resource allocation. Although most current models use classical methods like descriptive analysis or regression modelling, more sophisticated techniques may be needed to increase the accuracy of outcomes where big data is in use. This study focuses on the Machine Learning (ML) techniques to identify the ED usage pattern among frequent users and to evaluate the predicting ability of the models. I performed an extensive literature review to generate a list of potential predictors of ED frequent use. For this thesis, I used Korean Health Panel data from 2008 to 2015. Individuals with at least one ED visit were included, among whom those with four or more visits per year were considered frequent ED users. Demographic and clinical data was collected. The relationship between predictors and ED frequent use was examined through multivariable analysis. A K-modes clustering algorithm was applied to identify ED utilization patterns among frequent users. Finally, the performance of four machine learning classification algorithms was assessed and compared to logistic regression. The classification algorithms used in my thesis were Random Forest, Support Vector Machine (SVM), Bagging, and Voting. The models' performance was evaluated based on Positive Predictive Value (PPV), sensitivity, Area Under Curve (AUC), and classification error. A total of 9,348 individuals with 15,627 ED visits were eligible for this study. Frequent ED users accounted for 2.4% of all ED visits. Frequent ED users tended to be older, male, and more likely to be using ambulance as a mode of transport than non‐frequent ED users. In the cluster analysis, we identified three subgroups among frequent ED users: (i) older patients with respiratory system complaints, the highest discharged rates who were more likely to visit in Spring and Winter, (ii) older patients with the highest rate of hospitalization, who are also more likely to have used ambulance, and visited ED due to circulatory system complaints, (iii) younger patients, mostly female, with the highest rate of ED visits in summer, and lowest rate of using an ambulance, who visited ED mostly due to damages such as injuries, poisoning, etc. The ML classification algorithms predicted frequent ED users with high precision (90% - 98%) and sensitivity (87% - 91%), while showed high AUC scores from 89% for SVM to 96% for Random Forest, as well. The classification error varied among algorithms; logistic regression had the highest classification error (34.9%) while Random Forest had the least (3.8%). According to the Random Forest Importance Score, the top 5 factors predicting frequent users were disease category, age, day of the week, season, and sex. In this thesis, I showed how ML methods applies to ED users in population health. The study results show that ML classification algorithms are robust techniques with predictive power for future ED visit identification and prediction. As more data are collected and the amount of data availability increases, machine learning approaches is a promising tool for advancing the understanding of such ‘Big’ data.
machine learning, clustering, emergency department, frequent user, health services
Master of Science (M.Sc.)
Community Health and Epidemiology
Community and Population Health Science