Repository logo

Association between Gut Microbiome and Parkinson's Disease Revealed by Sparse Learning



Journal Title

Journal ISSN

Volume Title






Degree Level



\textbf{Background:} Many studies indicate that the human gut microbiota is likely to have connections with Parkinson disease (PD). Based on these indications, this thesis explores the association between PD and human gut microbiota, from a statistical machine learning perspective. With the purpose of identifying the association between PD and gut microbiota, we assess the predictivity of microbial operational taxonomy units (OTUs) that are extracted from participants' gut. \textbf{Methods:} We use linear support vector machine (SVM) and logistic regression combined with $L_1$ penalty and elastic-net penalty, to identify informative OTUs for PD. $L_1$ penalty is able to do shrinkage for features, which effectively implements feature selection by setting the coefficients of non-significant variables to be zero. Conversely, coefficients with larger absolute values indicate that the OTUs are more closely related to PD. Elastic-net penalty is capable of grouping correlated variables. Under these two penalties, SVM and logistic regression can achieve good predictive results as well as feature selection. In order to make full use of dataset and to avoid overfitting, we run models with Leave-one-out cross-validation (LOOCV). There are tuning parameters, $\lambda$ for each regularization. After running models with LOOCV, we choose the optimal $\lambda$ for each model, using test error rate as the criterion. \textbf{Results and Conclusions:} We analyze the performance of each optimal model , by calculating and understanding evaluation metrics of these models. Then, we find that for our dataset, logistic regression with $L_1$ penalty has the best performance. $R_{ER}^2$, $R_{AMLP}^2$, AUC and AUPR of logistic regression with $L_1$ are 43.9\%, 25.7\%, 0.8259 and 0.8788. We focus on the selected OTUs based on coefficients generated by models, and to the ranking of OTUs, according to their level of relevance to PD. Then, we find that some OTUs selected by logistic regression with $L_1$ have been identified in previous studies of micro-organisms, including Lactobacillus, Roseburia, Bluatia, Akkermansia and Bifidobacterium. We also explore predictive performances of logistic regression with elastic-net and regularized SVM, and then focus on OTUs selected by these models. The OTUs selected by these models also overlap with those identified by previous researchers.



Statistical Machine Learning, Sparse Learning, Regularization, Microbiome, Parkinson.



Master of Science (M.Sc.)


Mathematics and Statistics




Part Of