Repository logo
 

Improved sequence-read simulation for (meta)genomics

dc.contributor.advisorKusalik, Anthonyen_US
dc.contributor.committeeMemberMcQuillan, Ianen_US
dc.contributor.committeeMemberBickis, Mikelisen_US
dc.creatorJohnson, Stephenen_US
dc.date.accessioned2014-09-24T12:00:18Z
dc.date.available2014-09-24T12:00:18Z
dc.date.created2014-09en_US
dc.date.issued2014-09-23en_US
dc.date.submittedSeptember 2014en_US
dc.description.abstractThere are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors' original intentions. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. To our knowledge, there are no programs that allow a user to generate simulated data for metagenomics applications following empirical read-length distributions and quality profiles based on empirically-derived information from actual sequencing data. We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions. BEAR can emulate reads from various sequencing platforms, including Illumina, 454, and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and extracts useful statistics from the metagenomic data itself, such as quality-error models. Many existing simulators are specific to a particular sequencing technology; however, BEAR is not restricted in this way. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which no dedicated sequencing simulators are currently available. BEAR is also the first metagenomic sequencing simulator program that automates the process of generating abundances, which can be an arduous task. BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work.en_US
dc.identifier.urihttp://hdl.handle.net/10388/ETD-2014-09-1750en_US
dc.language.isoengen_US
dc.subjectBioinformaticsen_US
dc.subjectsequence analysisen_US
dc.subjectmachine learningen_US
dc.subjectsimulationen_US
dc.titleImproved sequence-read simulation for (meta)genomicsen_US
dc.type.genreThesisen_US
dc.type.materialtexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorUniversity of Saskatchewanen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Science (M.Sc.)en_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
JOHNSON-THESIS.pdf
Size:
2.38 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1008 B
Format:
Plain Text
Description: