Repository logo

Inferring Bayesian Networks From Microbiome Data


Journal Title

Journal ISSN

Volume Title






Degree Level



A microbiome can form a complex network of interacting bacteria, archaea, and fungi. To understand the interactions between these microbes with a justified method, we have to be careful about the compositional nature of such data, not having enough samples with respect to the number of features, and a lack of microbiome networks with known interactions. We have explored the application of Bayesian networks for deciphering the interactions between the microbes, taking care to address the challenges mentioned above. We built a software process, called a pipeline, which can take a dataset consisting of samples of operational taxonomic unit counts, and produce an undirected graph showing associations inferred from the data. To address the issue of the lack of known associations in microbiomes, we used synthetic microbiome data produced by a technique found in the literature, which produces synthetic operational taxonomic unit counts based on associations represented by undirected graphs. To address the problem of small samples relative to the number of features, we studied the sensitivity and specificity of our pipeline, by exploring the ratio of samples to features on synthetic data where the true associations are known. Our result suggests that the trade off between the cost of collecting samples and the value of inferred networks has an inflection point around 7 DoC. If fewer than 7N samples are used, the quality of the inferred network decreases drastically. However, after 7N samples the quality starts to level off. To address the problem of compositional data, we explored the effects of several normalization techniques on the sensitivity and specificity of the approach. We found that for both cluster and scale-free topologies the Centered Log Ratio (CLR), and simplex transformations decreased the accuracy of the learned model compared to non-normalized data, while log-transformed data performed better than the non-normalized data for cluster topology, and it performed no better than non-normalized data for scale-free topology. We compared our pipeline to a technique found in the literature, called Speic-Easi, on synthetic data and real data. We found that inference of Bayesian networks on the real data was able to detect significant number of edges in the inferred network of operational taxonomic units (OTU). However, the inference of SpieC-Easi failed to detect most of the edges when the real data were used. The number of detected edges were significant, using SpieC-Easi inference approach, when the synthetic data were used.



Bayesian Networks, Microbiome Data, Inference



Master of Science (M.Sc.)


Computer Science


Computer Science



Part Of