Microbial profiling using metagenomic assembly
The application of second generation sequencing technology to the characterization of complex microbial communities has profoundly affected our appreciation of microbial diversity. The explosive growth of microbial sequence data has also necessitated advances in bioinformatic methods for profiling microbial communities. Data aggregation strategies should allow the relation of metagenomic sequence data to our understanding of microbial taxonomy, while also facilitating the discovery of novel taxa. For eukaryotes, a method has been established that links DNA sequences to the identification of organisms: DNA Barcoding. A similar approach has been developed for prokaryotes using target genic regions as markers for species identification and to profile communities. A key difference in these efforts is that within DNA barcoding there is a formalized framework for the evaluation of barcoding targets, whereas for prokaryotes the 16S rRNA gene target has become the de facto barcode without formal evaluation. Using the framework developed for evaluating DNA barcodes in eukaryotes, a study was undertaken to formally evaluate 16S rRNA and cpn60 as DNA barcodes for Bacteria. Both 16S rRNA and cpn60 were found to meet the criteria for DNA barcodes, with cpn60 a preferred barcode based on its superior resolution of closely related taxa. The high resolution of cpn60 enabled a method of sequence data aggregation through sequence assembly: microbial profiling using metagenomic assembly (mPUMA). The scoring of metagenomic assemblies in terms of sensitivity and specificity of the operational taxonomic units formed was used to evaluate and optimize the assembly of cpn60 barcodes. Using optimized parameters, mPUMA was demonstrated to faithfully reconstruct a synthetic community in terms of richness and abundance. To facilitate the use of mPUMA, a software package was developed and released under an open source license. The utility of mPUMA was further examined through the characterization of the epiphytic seed microbiomes of Triticum and Brassica species. A microbiome shared across both crop genera including fungi and bacteria was detected: a particularly important observation as it implies that seeds may serve as a vector for microbes that could include both pathogenic and beneficial organisms. The relative abundances of taxa identified by mPUMA were confirmed by qPCR for multiple cases of both fungal and bacterial taxa. By culturing isolates of both bacteria and fungi from the seed surfaces it was demonstrated that mPUMA faithfully assembled consensus sequences for OTUs that were 100% identical to isolated fungi and bacteria. Patterns observed in the relative abundances of the shared microbiome OTUs were used to generate the hypothesis that an Pantoea-like bacterium and an Alternaria-like fungus had an antagonistic relationship, since sequences corresponding to these organisms showed reciprocal abundance patterns on Triticum and Brassica seeds. Studies of the interactions of cultured isolates revealed fungistatic interactions that could account for their reciprocal abundances. These interactions could be directly relevant to plant health, given that Alternaria-like fungi are linked to grain spoilage in wheat, and diseases in canola. Taken together, results of this thesis demonstrate the superiority of the cpn60 universal target as a barcode for Bacteria, forming the basis for an assembly-based strategy for microbial profiling of bacterial and eukaryotic microbial communities that can lead to the discovery of novel taxa and microbial interactions.
mPUMA, microbial profiling using metagenomic assembly, microbial profiling, metagenomics, microbiome, bioinformatics, computational biology, genomics, microbiota
Doctor of Philosophy (Ph.D.)