Repository logo
 

Inferring Bayesian Networks From Microbiome Data

dc.contributor.committeeMemberHorsch, Michael
dc.contributor.committeeMemberSiciliano, Steven
dc.contributor.committeeMemberLinks, Matthew
dc.contributor.committeeMemberNeufeld, Eric
dc.contributor.committeeMemberKhan, Shahedul
dc.creatorRahbar, Saman 1993-
dc.creator.orcid0000-0002-4750-8640
dc.date.accessioned2019-03-26T14:53:40Z
dc.date.available2022-03-26T06:05:09Z
dc.date.created2019-03
dc.date.submittedMarch 2019
dc.date.updated2019-03-26T14:53:41Z
dc.description.abstractA microbiome can form a complex network of interacting bacteria, archaea, and fungi. To understand the interactions between these microbes with a justified method, we have to be careful about the compositional nature of such data, not having enough samples with respect to the number of features, and a lack of microbiome networks with known interactions. We have explored the application of Bayesian networks for deciphering the interactions between the microbes, taking care to address the challenges mentioned above. We built a software process, called a pipeline, which can take a dataset consisting of samples of operational taxonomic unit counts, and produce an undirected graph showing associations inferred from the data. To address the issue of the lack of known associations in microbiomes, we used synthetic microbiome data produced by a technique found in the literature, which produces synthetic operational taxonomic unit counts based on associations represented by undirected graphs. To address the problem of small samples relative to the number of features, we studied the sensitivity and specificity of our pipeline, by exploring the ratio of samples to features on synthetic data where the true associations are known. Our result suggests that the trade off between the cost of collecting samples and the value of inferred networks has an inflection point around 7 DoC. If fewer than 7N samples are used, the quality of the inferred network decreases drastically. However, after 7N samples the quality starts to level off. To address the problem of compositional data, we explored the effects of several normalization techniques on the sensitivity and specificity of the approach. We found that for both cluster and scale-free topologies the Centered Log Ratio (CLR), and simplex transformations decreased the accuracy of the learned model compared to non-normalized data, while log-transformed data performed better than the non-normalized data for cluster topology, and it performed no better than non-normalized data for scale-free topology. We compared our pipeline to a technique found in the literature, called Speic-Easi, on synthetic data and real data. We found that inference of Bayesian networks on the real data was able to detect significant number of edges in the inferred network of operational taxonomic units (OTU). However, the inference of SpieC-Easi failed to detect most of the edges when the real data were used. The number of detected edges were significant, using SpieC-Easi inference approach, when the synthetic data were used.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10388/11929
dc.subjectBayesian Networks
dc.subjectMicrobiome Data
dc.subjectInference
dc.titleInferring Bayesian Networks From Microbiome Data
dc.typeThesis
dc.type.materialtext
local.embargo.terms2022-03-26
thesis.degree.departmentComputer Science
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Saskatchewan
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.Sc.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
RAHBAR-THESIS-2019.pdf
Size:
11.69 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.27 KB
Format:
Plain Text
Description: