Repository logo

Enhancing Computational Methods for Strain Typing and Separating Strains of Mycoplasma bovis in Mixed Culture



Journal Title

Journal ISSN

Volume Title






Degree Level



There are no programs that allow a user to isolate strain-specific sequences within a complex assembly of mixed bacterial strains, unbiased by reference assembly. The tools that do exist each have a specialized focus, such as isolating small haplotype differences within strains, or have a reliance on reference genomes that may bias the sequences. For this purpose we have developed a tool called the Separator of Strain Inherent Sequences (SepSIS) that extracts sequences specific to each bacterial strain from the de novo assembly graph created using the SPAdes assembler. SepSIS is accompanied by a set of pre-processing scripts that form the “SepSIS pipeline”. The scripts are available at “”. The SepSIS pipeline provides two functionalities, with each accepting a particular form of input data. The pipeline was designed for use with Illumina MiSeq paired-read data, but in theory, any read dataset compatible with SPAdes could function with SepSIS. The first function of the SepSIS pipeline accepts reads obtained from non-clonal bacterial isolates as input. It then attempts to isolate the complete strain-specific sequences using relative coverage levels of strain-specific subsequences in the assembly graph. It is marginally successful at this task. The second function of the SepSIS pipeline accepts reads from independently cultured isolates and mixes them in silico before assembly. After assembly, the contiguous sequences are analyzed by SepSIS using meta-information describing their strain of origin to produce lists of sequences specific to each strain. These sequences can then be studied and contrasted further. The second functionality of SepSIS was used to perform two primary investigations. The first investigation identifies unique sequences from sets of isolates, where each set was hypothesized to consist entirely of copies of a single strain. This investigation analyzed 10 sets of 5 independently sequenced isolates of Mycoplasma bovis, with all the isolates originating from a single culture spread on a growth plate. Despite originating from a single culture, it was found that many of the isolates had unique sequences; therefore, these isolates likely each represent an individual strain. The second investigation was based upon mixing two or more strains with contrasting phenotypic features allowing the second function of SepSIS to be applied to isolating sequences potentially responsible for each phenotype. By running multiple mixes with the same contrasting phenotypic combinations, the intersection of sequences common to a phenotype can be identified. This type of investigation was performed on 29 pairs of Mycoplasma bovis lung and stifle joint isolates, with each pair originating from a single animal. Infection location was considered a phenotype and sequences unique to each infection location were isolated and identified. The sequences with the strongest correlation to phenotype were variants of Mycoplasma bovis insertion sequences, or were from genes for variable surface lipoproteins and HAD-family hydrolases. The results show that SepSIS is useful when provided with reads sequenced from independently cultured isolates along with meta-information.



strain, Mycoplasma bovis, bacteria, bioinformatics, genotype, phenotype



Master of Science (M.Sc.)


Computer Science


Computer Science


Part Of