Repository logo
 

Parallel algorithms for real-time peptide-spectrum matching

dc.contributor.advisorMcQuillan, Ianen_US
dc.contributor.advisorWu, FangXiangen_US
dc.contributor.committeeMemberKim, Theodoreen_US
dc.contributor.committeeMemberKusalik, Tonyen_US
dc.contributor.committeeMemberTeng, Danielen_US
dc.creatorZhang, Jianen_US
dc.date.accessioned2010-12-13T11:42:48Zen_US
dc.date.accessioned2013-01-04T05:10:23Z
dc.date.available2011-12-16T08:00:00Zen_US
dc.date.available2013-01-04T05:10:23Z
dc.date.created2010-12en_US
dc.date.issued2010-12en_US
dc.date.submittedDecember 2010en_US
dc.description.abstractTandem mass spectrometry is a powerful experimental tool used in molecular biology to determine the composition of protein mixtures. It has become a standard technique for protein identification. Due to the rapid development of mass spectrometry technology, the instrument can now produce a large number of mass spectra which are used for peptide identification. The increasing data size demands efficient software tools to perform peptide identification. In a tandem mass experiment, peptide ion selection algorithms generally select only the most abundant peptide ions for further fragmentation. Because of this, the low-abundance proteins in a sample rarely get identified. To address this problem, researchers develop the notion of a `dynamic exclusion list', which maintains a list of newly selected peptide ions, and it ensures these peptide ions do not get selected again for a certain time. In this way, other peptide ions will get more opportunity to be selected and identified, allowing for identification of peptides of lower abundance. However, a better method is to also include the identification results into the `dynamic exclusion list' approach. In order to do this, a real-time peptide identification algorithm is required. In this thesis, we introduce methods to improve the speed of peptide identification so that the `dynamic exclusion list' approach can use the peptide identification results without affecting the throughput of the instrument. Our work is based on RT-PSM, a real-time program for peptide-spectrum matching with statistical significance. We profile the speed of RT-PSM and find out that the peptide-spectrum scoring module is the most time consuming portion. Given by the profiling results, we introduce methods to parallelize the peptide-spectrum scoring algorithm. In this thesis, we propose two parallel algorithms using different technologies. We introduce parallel peptide-spectrum matching using SIMD instructions. We implemented and tested the parallel algorithm on Intel SSE architecture. The test results show that a 18-fold speedup on the entire process is obtained. The second parallel algorithm is developed using NVIDIA CUDA technology. We describe two CUDA kernels based on different algorithms and compare the performance of the two kernels. The more efficient algorithm is integrated into RT-PSM. The time measurement results show that a 190-fold speedup on the scoring module is achieved and 26-fold speedup on the entire process is obtained. We perform profiling on the CUDA version again to show that the scoring module has been optimized sufficiently to the point where it is no longer the most time-consuming module in the CUDA version of RT-PSM. In addition, we evaluate the feasibility of creating a metric index to reduce the number of candidate peptides. We describe evaluation methods, and show that general indexing methods are not likely feasible for RT-PSM.en_US
dc.identifier.urihttp://hdl.handle.net/10388/etd-12132010-114248en_US
dc.language.isoen_USen_US
dc.subjectBioinfomaticsen_US
dc.subjectSIMDen_US
dc.subjectParallelen_US
dc.subjectGPUen_US
dc.subjectComputer Scienceen_US
dc.titleParallel algorithms for real-time peptide-spectrum matchingen_US
dc.type.genreThesisen_US
dc.type.materialtexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorUniversity of Saskatchewanen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Science (M.Sc.)en_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thesis_MSc_jiz869.pdf
Size:
2.35 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
905 B
Format:
Plain Text
Description: