Repository logo
 

Fully Bayesian T-probit Regression with Heavy-tailed Priors for Selection in High-Dimensional Features with Grouping Structure

dc.contributor.advisorLi, Longhaien_US
dc.contributor.committeeMemberBickis, Miken_US
dc.contributor.committeeMemberLiu, Juxinen_US
dc.contributor.committeeMemberKusalik, Anthonyen_US
dc.contributor.committeeMemberStephens, Daviden_US
dc.creatorJiang, Laien_US
dc.date.accessioned2015-10-09T12:00:14Z
dc.date.available2015-10-09T12:00:14Z
dc.date.created2015-09en_US
dc.date.issued2015-10-08en_US
dc.date.submittedSeptember 2015en_US
dc.description.abstractFeature selection is demanded in many modern scientific research problems that use high-dimensional data. A typical example is to find the genes that are most related to a certain disease (e.g., cancer) from high-dimensional gene expression profiles. There are tremendous difficulties in eliminating a large number of useless or redundant features. The expression levels of genes have structure; for example, a group of co-regulated genes that have similar biological functions tend to have similar mRNA expression levels. Many statistical methods have been proposed to take the grouping structure into consideration in feature selection and regression, including Group LASSO, Supervised Group LASSO, and regression on group representatives. In this thesis, we propose to use a sophisticated Markov chain Monte Carlo method (Hamiltonian Monte Carlo with restricted Gibbs sampling) to fit T-probit regression with heavy-tailed priors to make selection in the features with grouping structure. We will refer to this method as fully Bayesian T-probit. The main feature of fully Bayesian T-probit is that it can make feature selection within groups automatically without a pre-specification of the grouping structure and more efficiently discard noise features than LASSO (Least Absolute Shrinkage and Selection Operator). Therefore, the feature subsets selected by fully Bayesian T-probit are significantly more sparse than subsets selected by many other methods in the literature. Such succinct feature subsets are much easier to interpret or understand based on existing biological knowledge and further experimental investigations. In this thesis, we use simulated and real datasets to demonstrate that the predictive performances of the more sparse feature subsets selected by fully Bayesian T-probit are comparable with the much larger feature subsets selected by plain LASSO, Group LASSO, Supervised Group LASSO, random forest, penalized logistic regression and t-test. In addition, we demonstrate that the succinct feature subsets selected by fully Bayesian T-probit have significantly better predictive power than the feature subsets of the same size taken from the top features selected by the aforementioned methods.en_US
dc.identifier.urihttp://hdl.handle.net/10388/ETD-2015-09-2232en_US
dc.language.isoengen_US
dc.subjectBayesian methodsen_US
dc.subjectprobiten_US
dc.subjectMCMCen_US
dc.subjectgene expression dataen_US
dc.subjectgrouping structureen_US
dc.titleFully Bayesian T-probit Regression with Heavy-tailed Priors for Selection in High-Dimensional Features with Grouping Structureen_US
dc.type.genreThesisen_US
dc.type.materialtexten_US
thesis.degree.departmentMathematics and Statisticsen_US
thesis.degree.disciplineMathematicsen_US
thesis.degree.grantorUniversity of Saskatchewanen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophy (Ph.D.)en_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
JIANG-DISSERTATION.pdf
Size:
4 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.2 KB
Format:
Plain Text
Description: