Liu, Juxin2023-09-062023-09-0620232023-112023-09-06November 2https://hdl.handle.net/10388/14957With the advancement of sequencing methods, investigators now have more opportunities to understand the microbial community’s role in human and plant health. For instance, studying the biological network among microbial taxa can offer researchers insights into plant breeding. Also, studying the human microbiome can help us to understand functions and illnesses. However, analyzing microbiome data presents significant challenges due to the structure of the data. A critical issue in microbiome data analysis is the presence of a large number of zeros. Although many methods for microbiome data analysis have been published in the current literature, it remains challenging for investigators to select the appropriate method. Therefore, our work focuses on exploring recent methods for handling zeros in microbiome data analysis and provides a detailed numerical comparison. First, we introduce four recent methods: the Bayesian-multiplicative replacement model, the gamma-normal mixture model, the zero-inflated Dirichlet tree multinomial model, and the zero-inflated probabilistic PCA model, detailing their advantages and limitations. Second, we design and implement simulation studies using our novel data generator, the zero-inflated logistic normal multinomial model, which makes use of phylogenetic tree distance. To the best of our knowledge, this is the first zero-inflated model that employs the phylogenetic tree distance. Finally, we evaluated these four methods using the Frobenius norm error, mean squared error for Simpson’s Index, and Wasserstein distance error in this thesis. The simulation results suggest that the Zero-Inflated Dirichlet Tree Multinomial model (with pseudo counts of 0.5 used as the smoothing method) outperforms other methods with the smallest Frobenius norm error and mean squared error for Simpson’s Index. Additionally, the Square Root Multiplicative Treatment model displays notable performance, evidenced by a minimal Wasserstein distance error and efficient running time in our simulation study. Conversely, the zero-inflated probabilistic PCA model does not perform as expected due to issues with parameter estimation convergence.application/pdfenZero-Inflated, Microbiome Data, Phylogenetic Tree DistanceNumerical Comparison: Different Methods of Handling Zeros in Microbiome Data AnalysisThesis2023-09-06