Repository logo
 

Association between Gut Microbiome and Parkinson's Disease Revealed by Sparse Learning

dc.contributor.advisorLi, Longhai
dc.contributor.committeeMemberRayan, Steven
dc.contributor.committeeMemberBalbuena, Lloyd
dc.contributor.committeeMemberRoy, Lee
dc.creatorChen, Man
dc.creator.orcid0000-0002-0598-6249
dc.date.accessioned2021-05-31T17:53:15Z
dc.date.available2021-05-31T17:53:15Z
dc.date.created2021-05
dc.date.issued2021-05-25
dc.date.submittedMay 2021
dc.date.updated2021-05-31T17:53:16Z
dc.description.abstract\textbf{Background:} Many studies indicate that the human gut microbiota is likely to have connections with Parkinson disease (PD). Based on these indications, this thesis explores the association between PD and human gut microbiota, from a statistical machine learning perspective. With the purpose of identifying the association between PD and gut microbiota, we assess the predictivity of microbial operational taxonomy units (OTUs) that are extracted from participants' gut. \textbf{Methods:} We use linear support vector machine (SVM) and logistic regression combined with $L_1$ penalty and elastic-net penalty, to identify informative OTUs for PD. $L_1$ penalty is able to do shrinkage for features, which effectively implements feature selection by setting the coefficients of non-significant variables to be zero. Conversely, coefficients with larger absolute values indicate that the OTUs are more closely related to PD. Elastic-net penalty is capable of grouping correlated variables. Under these two penalties, SVM and logistic regression can achieve good predictive results as well as feature selection. In order to make full use of dataset and to avoid overfitting, we run models with Leave-one-out cross-validation (LOOCV). There are tuning parameters, $\lambda$ for each regularization. After running models with LOOCV, we choose the optimal $\lambda$ for each model, using test error rate as the criterion. \textbf{Results and Conclusions:} We analyze the performance of each optimal model , by calculating and understanding evaluation metrics of these models. Then, we find that for our dataset, logistic regression with $L_1$ penalty has the best performance. $R_{ER}^2$, $R_{AMLP}^2$, AUC and AUPR of logistic regression with $L_1$ are 43.9\%, 25.7\%, 0.8259 and 0.8788. We focus on the selected OTUs based on coefficients generated by models, and to the ranking of OTUs, according to their level of relevance to PD. Then, we find that some OTUs selected by logistic regression with $L_1$ have been identified in previous studies of micro-organisms, including Lactobacillus, Roseburia, Bluatia, Akkermansia and Bifidobacterium. We also explore predictive performances of logistic regression with elastic-net and regularized SVM, and then focus on OTUs selected by these models. The OTUs selected by these models also overlap with those identified by previous researchers.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/10388/13403
dc.subjectStatistical Machine Learning, Sparse Learning, Regularization, Microbiome, Parkinson.
dc.titleAssociation between Gut Microbiome and Parkinson's Disease Revealed by Sparse Learning
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentMathematics and Statistics
thesis.degree.disciplineMathematics
thesis.degree.grantorUniversity of Saskatchewan
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.Sc.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CHEN-THESIS-2021.pdf
Size:
714.43 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.26 KB
Format:
Plain Text
Description: