Enhancing Computational Methods for Strain Typing and Separating Strains of Mycoplasma bovis in Mixed Culture
Date
2020-12-02
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ORCID
0000-0003-0194-2238
Type
Thesis
Degree Level
Masters
Abstract
There are no programs that allow a user to isolate strain-specific sequences within a complex assembly
of mixed bacterial strains, unbiased by reference assembly. The tools that do exist each have a specialized
focus, such as isolating small haplotype differences within strains, or have a reliance on reference genomes that
may bias the sequences. For this purpose we have developed a tool called the Separator of Strain Inherent
Sequences (SepSIS) that extracts sequences specific to each bacterial strain from the de novo assembly graph
created using the SPAdes assembler. SepSIS is accompanied by a set of pre-processing scripts that form the “SepSIS pipeline”. The scripts are available at “https://github.com/MatthewWaldner/sepsis”. The SepSIS
pipeline provides two functionalities, with each accepting a particular form of input data. The pipeline was
designed for use with Illumina MiSeq paired-read data, but in theory, any read dataset compatible with
SPAdes could function with SepSIS. The first function of the SepSIS pipeline accepts reads obtained from
non-clonal bacterial isolates as input. It then attempts to isolate the complete strain-specific sequences using
relative coverage levels of strain-specific subsequences in the assembly graph. It is marginally successful at
this task. The second function of the SepSIS pipeline accepts reads from independently cultured isolates and
mixes them in silico before assembly. After assembly, the contiguous sequences are analyzed by SepSIS using
meta-information describing their strain of origin to produce lists of sequences specific to each strain. These
sequences can then be studied and contrasted further.
The second functionality of SepSIS was used to perform two primary investigations. The first investigation
identifies unique sequences from sets of isolates, where each set was hypothesized to consist entirely of copies
of a single strain. This investigation analyzed 10 sets of 5 independently sequenced isolates of Mycoplasma
bovis, with all the isolates originating from a single culture spread on a growth plate. Despite originating
from a single culture, it was found that many of the isolates had unique sequences; therefore, these isolates
likely each represent an individual strain. The second investigation was based upon mixing two or more
strains with contrasting phenotypic features allowing the second function of SepSIS to be applied to isolating
sequences potentially responsible for each phenotype. By running multiple mixes with the same contrasting
phenotypic combinations, the intersection of sequences common to a phenotype can be identified. This type
of investigation was performed on 29 pairs of Mycoplasma bovis lung and stifle joint isolates, with each pair
originating from a single animal. Infection location was considered a phenotype and sequences unique to each
infection location were isolated and identified. The sequences with the strongest correlation to phenotype
were variants of Mycoplasma bovis insertion sequences, or were from genes for variable surface lipoproteins
and HAD-family hydrolases. The results show that SepSIS is useful when provided with reads sequenced
from independently cultured isolates along with meta-information.
Description
Keywords
strain, Mycoplasma bovis, bacteria, bioinformatics, genotype, phenotype
Citation
Degree
Master of Science (M.Sc.)
Department
Computer Science
Program
Computer Science