Fully Bayesian T-probit Regression with Heavy-tailed Priors for Selection in High-Dimensional Features with Grouping Structure

Jiang, Lai

Fully Bayesian T-probit Regression with Heavy-tailed Priors for Selection in High-Dimensional Features with Grouping Structure

dc.contributor.advisor	Li, Longhai	en_US
dc.contributor.committeeMember	Bickis, Mik	en_US
dc.contributor.committeeMember	Liu, Juxin	en_US
dc.contributor.committeeMember	Kusalik, Anthony	en_US
dc.contributor.committeeMember	Stephens, David	en_US
dc.creator	Jiang, Lai	en_US
dc.date.accessioned	2015-10-09T12:00:14Z
dc.date.available	2015-10-09T12:00:14Z
dc.date.created	2015-09	en_US
dc.date.issued	2015-10-08	en_US
dc.date.submitted	September 2015	en_US
dc.description.abstract	Feature selection is demanded in many modern scientific research problems that use high-dimensional data. A typical example is to find the genes that are most related to a certain disease (e.g., cancer) from high-dimensional gene expression profiles. There are tremendous difficulties in eliminating a large number of useless or redundant features. The expression levels of genes have structure; for example, a group of co-regulated genes that have similar biological functions tend to have similar mRNA expression levels. Many statistical methods have been proposed to take the grouping structure into consideration in feature selection and regression, including Group LASSO, Supervised Group LASSO, and regression on group representatives. In this thesis, we propose to use a sophisticated Markov chain Monte Carlo method (Hamiltonian Monte Carlo with restricted Gibbs sampling) to fit T-probit regression with heavy-tailed priors to make selection in the features with grouping structure. We will refer to this method as fully Bayesian T-probit. The main feature of fully Bayesian T-probit is that it can make feature selection within groups automatically without a pre-specification of the grouping structure and more efficiently discard noise features than LASSO (Least Absolute Shrinkage and Selection Operator). Therefore, the feature subsets selected by fully Bayesian T-probit are significantly more sparse than subsets selected by many other methods in the literature. Such succinct feature subsets are much easier to interpret or understand based on existing biological knowledge and further experimental investigations. In this thesis, we use simulated and real datasets to demonstrate that the predictive performances of the more sparse feature subsets selected by fully Bayesian T-probit are comparable with the much larger feature subsets selected by plain LASSO, Group LASSO, Supervised Group LASSO, random forest, penalized logistic regression and t-test. In addition, we demonstrate that the succinct feature subsets selected by fully Bayesian T-probit have significantly better predictive power than the feature subsets of the same size taken from the top features selected by the aforementioned methods.	en_US
dc.identifier.uri	http://hdl.handle.net/10388/ETD-2015-09-2232	en_US
dc.language.iso	eng	en_US
dc.subject	Bayesian methods	en_US
dc.subject	probit	en_US
dc.subject	MCMC	en_US
dc.subject	gene expression data	en_US
dc.subject	grouping structure	en_US
dc.title	Fully Bayesian T-probit Regression with Heavy-tailed Priors for Selection in High-Dimensional Features with Grouping Structure	en_US
dc.type.genre	Thesis	en_US
dc.type.material	text	en_US
thesis.degree.department	Mathematics and Statistics	en_US
thesis.degree.discipline	Mathematics	en_US
thesis.degree.grantor	University of Saskatchewan	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy (Ph.D.)	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: JIANG-DISSERTATION.pdf
Size:: 4 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.2 KB
Format:: Plain Text
Description:

Download

Collections

Graduate Theses and Dissertations