Inferring Bayesian Networks From Microbiome Data

Rahbar, Saman 1993-

Inferring Bayesian Networks From Microbiome Data

dc.contributor.committeeMember	Horsch, Michael
dc.contributor.committeeMember	Siciliano, Steven
dc.contributor.committeeMember	Links, Matthew
dc.contributor.committeeMember	Neufeld, Eric
dc.contributor.committeeMember	Khan, Shahedul
dc.creator	Rahbar, Saman 1993-
dc.creator.orcid	0000-0002-4750-8640
dc.date.accessioned	2019-03-26T14:53:40Z
dc.date.available	2022-03-26T06:05:09Z
dc.date.created	2019-03
dc.date.submitted	March 2019
dc.date.updated	2019-03-26T14:53:41Z
dc.description.abstract	A microbiome can form a complex network of interacting bacteria, archaea, and fungi. To understand the interactions between these microbes with a justified method, we have to be careful about the compositional nature of such data, not having enough samples with respect to the number of features, and a lack of microbiome networks with known interactions. We have explored the application of Bayesian networks for deciphering the interactions between the microbes, taking care to address the challenges mentioned above. We built a software process, called a pipeline, which can take a dataset consisting of samples of operational taxonomic unit counts, and produce an undirected graph showing associations inferred from the data. To address the issue of the lack of known associations in microbiomes, we used synthetic microbiome data produced by a technique found in the literature, which produces synthetic operational taxonomic unit counts based on associations represented by undirected graphs. To address the problem of small samples relative to the number of features, we studied the sensitivity and specificity of our pipeline, by exploring the ratio of samples to features on synthetic data where the true associations are known. Our result suggests that the trade off between the cost of collecting samples and the value of inferred networks has an inflection point around 7 DoC. If fewer than 7N samples are used, the quality of the inferred network decreases drastically. However, after 7N samples the quality starts to level off. To address the problem of compositional data, we explored the effects of several normalization techniques on the sensitivity and specificity of the approach. We found that for both cluster and scale-free topologies the Centered Log Ratio (CLR), and simplex transformations decreased the accuracy of the learned model compared to non-normalized data, while log-transformed data performed better than the non-normalized data for cluster topology, and it performed no better than non-normalized data for scale-free topology. We compared our pipeline to a technique found in the literature, called Speic-Easi, on synthetic data and real data. We found that inference of Bayesian networks on the real data was able to detect significant number of edges in the inferred network of operational taxonomic units (OTU). However, the inference of SpieC-Easi failed to detect most of the edges when the real data were used. The number of detected edges were significant, using SpieC-Easi inference approach, when the synthetic data were used.
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10388/11929
dc.subject	Bayesian Networks
dc.subject	Microbiome Data
dc.subject	Inference
dc.title	Inferring Bayesian Networks From Microbiome Data
dc.type	Thesis
dc.type.material	text
local.embargo.terms	2022-03-26
thesis.degree.department	Computer Science
thesis.degree.discipline	Computer Science
thesis.degree.grantor	University of Saskatchewan
thesis.degree.level	Masters
thesis.degree.name	Master of Science (M.Sc.)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: RAHBAR-THESIS-2019.pdf
Size:: 11.69 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: LICENSE.txt
Size:: 2.27 KB
Format:: Plain Text
Description:

Download

Collections

Graduate Theses and Dissertations