Repository logo
 

A bioinformatics pipeline for recovering misidentified proteins

dc.contributor.advisorKusalik, Anthonyen_US
dc.contributor.advisorGordon, Grayen_US
dc.contributor.committeeMemberRoss, Andrewen_US
dc.contributor.committeeMemberMcQuillan, Ianen_US
dc.contributor.committeeMemberVan Kessel, Andrewen_US
dc.creatorMehrotra, Sudeepen_US
dc.date.accessioned2010-09-03T09:22:51Zen_US
dc.date.accessioned2013-01-04T04:56:30Z
dc.date.available2011-09-07T08:00:00Zen_US
dc.date.available2013-01-04T04:56:30Z
dc.date.created2009-12en_US
dc.date.issued2009-12en_US
dc.date.submittedDecember 2009en_US
dc.description.abstractTo examine the response of wheat to different temperatures and photoperiods at the proteomic level, a series of experiments was performed at the University of Saskatchewan, College of Agriculture and Bioresources, Department of Plant Science. Tandem-mass spectrometry (MS/MS) was used for protein identification. The iTRAQ approach was used to generate raw data for protein quantification. The Pro Group protein identification software was used for protein identification and quantification of differentially expressed proteins. Despite the input samples being from a plant, the software reported non-plant proteins. The traditional approach used by scientists to deal with this problem is to use sequence alignment software to find close green-plant homologs of the non-plant proteins from a plant-only database. Such a technique is problematic since homology-based sequence similarity does not generally equate to similarity of mass spectra. In this work a more radical approach was investigated and implemented. A bioinformatics pipeline was designed and implemented to report plant proteins misidentified by the Pro Group software. The approach drew its idea from the fact that MS/MS-based protein identification uses peptide fragments/ions bearing unique m/z values in the mass spectra. From the reported non-plant proteins and associated peptides, putative m/z values of the peptides are generated and then used to find alternate hits from a green plant-only database. The pipeline uses three different heuristics, each generating a list of candidate proteins. The proteins reported consistently across the three reported lists have the highest likelihood to be present in the original sample. To evaluate the performance of the pipeline, three separate experiments were performed. A set of known plant peptides, a combination of known plant and non-plant peptides and a set of known non-plant peptides were used as input to the pipeline. For each experiment a stringency value (threshold value) was set by the user. Better results were observed by specifying a tighter stringency; that is, more plant proteins were reported consistently across the three reported lists. The research presented in this thesis shows that m/z values, consideration of unique peptides and accounting for proteins with shorter sequences can be used to identify proteins. These characteristics can be used to identify proteins when limited information is available, in this case a list of non-plant proteins reported as being present in a plant-derived sample. The information available was limited because the original input data was already processed by the Pro Group software. The approach presented here is an alternative to a wet lab scientist using sequence alignment tools, sequence databases, and homology-based search. The pipeline can be enhanced by adding various other modules. The results presented here could be used as a foundation for a further study.en_US
dc.identifier.urihttp://hdl.handle.net/10388/etd-09032010-092251en_US
dc.language.isoen_USen_US
dc.subjecttryptic digestsen_US
dc.subjectamino acid compositionen_US
dc.subjectBLASTen_US
dc.subjecthomology-based approachen_US
dc.subjectuse of mass-to-charge ratioen_US
dc.titleA bioinformatics pipeline for recovering misidentified proteinsen_US
dc.type.genreThesisen_US
dc.type.materialtexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorUniversity of Saskatchewanen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Science (M.Sc.)en_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mehrotra-Sudeep-ETD-Final.pdf
Size:
2.03 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
905 B
Format:
Plain Text
Description: