Penalized Regression Methods for Modelling Rare Events Data with Application to Occupational Injury Study
Occupational injuries are a serious public health concern for workers around the world. Among all occupational injuries reported to the Workers' Compensation Board of Saskatchewan (WCB-SK) from 2007-2016, 177 (0.06%) out of 280,704 injury claims were fatal. Although work-related injuries are relatively rare, they have tremendous impact on the workers, their family, as well as a company's overall productivity, hiring/training costs, and insurance premiums. To help inform prevention of fatal claims, this study identified factors that increase the probability of fatal injury claims in Saskatchewan. WCB Saskatchewan's administrative occupational injury claims data from 2007-2016 was used to extract fatal and non-fatal occupational events. Potential covariates included worker characteristics (age, gender, occupation) and incident characteristics (source of injury, cause of injury, part of body). Given the fatality being rare in this study, conventional logistic regression including multiple categorical covariates with over 40 parameters yielded biased parameter estimates. Penalized logistic regression methods, such as bias-correction method, i.e. Firth's method as well as the model selection methods, i.e., lasso and elastic net were compared to identify an optimal modelling strategy for calculating the odds ratio (OR) and 95% confidence intervals (CI) for probability of a WCB claim being fatal (vs. non-fatal). Based on the best-fitting model, i.e., Firth's logistic regression of the selected variables under the elastic net method, odds of a claim being fatal was 5.5 (95% CI: 2.77,12.46) times higher among men than women and was 6.59 (95% CI: 3.59,12.20) times higher for seniors aged 65-85 as compared with those who are aged 14-24. Odds of a claim being fatal among those who work in primary industry is 2.85 (95% CI: 1.07,9.39) higher than those working in social sciences. The odds of injury being fatal for machinery sources is 51 (95% CI: 10.38,505.38) times higher than chemical products as the source. Men workers are at higher risk of a claim being fatal (vs non-fatal). With respect to age, result of analysis showed that the middle-aged workers are at a lower risk, and the young workers are at a higher risk than middle aged workers. The risk of a claim being fatal increased sharply as age increased from 45 to 85. Primary industry sector and machinery have a disproportionate share of fatal claims. This knowledge can improve workplace safety by learning from past incidents, identifying significant risk factors, and implementing targeted prevention strategies. Through development of effective interventions, we hope to prevent fatal injuries in Saskatchewan.
Penalized regression methods, Occupational injuries
Master of Science (M.Sc.)
School of Public Health