Show simple item record

dc.contributor.advisorMcQuillan, Ianen_US
dc.creatorJin, Linglingen_US
dc.date.accessioned2010-04-09T09:37:48Zen_US
dc.date.accessioned2013-01-04T04:28:28Z
dc.date.available2011-04-13T08:00:00Zen_US
dc.date.available2013-01-04T04:28:28Z
dc.date.created2010-03en_US
dc.date.issued2010-03en_US
dc.date.submittedMarch 2010en_US
dc.identifier.urihttp://hdl.handle.net/10388/etd-04092010-093748en_US
dc.description.abstractSequence alignment has become one of the most common tasks in bioinformatics. Most of the existing sequence alignment methods use general scoring schemes. But these alignments are sometimes not completely relevant because they do not necessarily provide the desired information. It would be extremely difficult, if not impossible, to include any possible objective into an algorithm. Our goal is to allow a working biologist to augment a given alignment with additional information based on their knowledge and objectives.In this thesis, we will formally define constraints and compatible constraint sets for an alignment which require some positions of the sequences to be aligned together. Using this approach, one can align some specific segments such as domains within protein sequences by inputting constraints (the positions of the segments on the sequences), and the algorithm will automatically find an optimal alignment in which the segments are aligned together.A necessary prerequisite of calculating an alignment is that the constraints inputted be compatible with each other, and we will develop algorithms to check this condition for both pairwise and multiple sequence alignments. The algorithms are based on a depth-first search on a graph that is converted from the constraints and the alignment. We then develop algorithms to perform pairwise and multiple sequence alignments satisfying these compatible constraints.Using straightforward dynamic programming for pairwise sequence alignment satisfying a compatible constraint set, an optimal alignment corresponds to a path going through the dynamic programming matrix, and as we are only using single-position constraints, a constraint can be represented as a point on the matrix, so a compatible constraint set is a set of points. We try to determine a new path, rather than the original path, that achieves the highest score which goes through all the compatible constraint set points. The path is a concatenation of sub-paths, so that only the scores in the sub-matrices need to be calculated. This means the time required to get the new path decreases as the number of constraints increases, and it also varies as the positions of the points change. It can be further reduced by using the information from the original alignment, which can offer a significant speed gain.We then use exact and progressive algorithms to find multiple sequence alignments satisfying a compatible constraint set, which are extensions of pairwise sequence alignments. With exact algorithms for three sequences, where constraints are represented as lines, we discuss a method to force the optimal path to cross the constraint lines. And with progressive algorithms, we use a set of pairwise alignments satisfying compatible constraints to construct multiple sequence alignments progressively. Because they are more complex, we leave some extensions as future work.en_US
dc.language.isoen_USen_US
dc.subjectmultiple sequence alignmenten_US
dc.subjectconstrainten_US
dc.subjectcompatible constraint seten_US
dc.titleMultiple sequence alignment augmented by expert user constraintsen_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorUniversity of Saskatchewanen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Science (M.Sc.)en_US
dc.type.materialtexten_US
dc.type.genreThesisen_US
dc.contributor.committeeMemberAngel, Joseph F.en_US
dc.contributor.committeeMemberOsgood, Nathanielen_US
dc.contributor.committeeMemberKusalik, Tonyen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record