Show simple item record

dc.contributor.advisorMakaroff, Dwight
dc.creatorAbrar, Faheem 1992-
dc.date.accessioned2019-05-10T05:28:43Z
dc.date.available2019-05-10T05:28:43Z
dc.date.created2019-04
dc.date.issued2019-05-09
dc.date.submittedApril 2019
dc.identifier.urihttp://hdl.handle.net/10388/12087
dc.description.abstractA Genome Wide Association Study (GWAS) is an important bioinformatics method to associate variants with traits, identify causes of diseases and increase plant and crop production. There are several optimizations for improving GWAS performance, including running applications in parallel. However, it can be difficult for researchers to utilize different data types and workflows using existing approaches. A potential solution for this problem is to model GWAS algorithms as a set of modular tasks. In this thesis, a modular pipeline architecture for GWAS applications is proposed that can leverage a parallel computing environment as well as store and retrieve data using a shared data cache. To show that the proposed architecture increases performance of GWAS applications, two case studies are conducted in which the proposed architecture is implemented on a bioinformatics pipeline package called TASSEL and a GWAS application called FaST-LMM using both Apache Spark and Dask as the parallel processing framework and Redis as the shared data cache. The case studies implement parallel processing modules and shared data cache modules according to the specifications of the proposed architecture. Based on the case studies, a number of experiments are conducted that compare the performance of the implemented architecture on a cluster environment with the original programs. The experiments reveal that the modified applications indeed perform faster than the original sequential programs. However, the modified applications do not scale with cluster resources, as the sequential part of the operations prevent the parallelization from having linear scalability. Finally, an evaluation of the architecture was conducted based on feedback from software developers and bioinformaticians. The evaluation reveals that the domain experts find the architecture useful; the implementations have sufficient performance improvement and they are also easy to use, although a GUI based implementation would be preferable.
dc.format.mimetypeapplication/pdf
dc.subjectGWAS
dc.titleA Modular Parallel Pipeline Architecture for GWAS Applications in a Cluster Environment
dc.typeThesis
dc.date.updated2019-05-10T05:28:44Z
thesis.degree.departmentComputer Science
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Saskatchewan
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.Sc.)
dc.type.materialtext
dc.contributor.committeeMemberSchneider, Kevin
dc.contributor.committeeMemberLinks, Matthew
dc.contributor.committeeMemberOsgood, Nathaniel
dc.contributor.committeeMemberRobinson, Steve
dc.creator.orcid0000-0002-5172-5488


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record