Detection of orthologs via genetic mapping augmentation
Researchers interested in examining a given species of interest (or target species) that lacks complete sequence data can infer some knowledge of that species from one or more related species that has a complete set of data. To infer knowledge, it is desired to compare the available sequence data between the two species to find orthologs. However, without complete data sets, one cannot be certain of the validity of the detected orthologs. Using ortholog detection systems in concert with species’ mapping data, researchers can find regions of shared synteny, allowing for more certainty of the detected orthologs as well as allowing inference of some genetic information based on these regions of shared synteny. A pipeline software solution, Detection of Orthologs via Genetic Mapping Augmentation (DOGMA), was developed for this purpose. DOGMA’s functionality was tested using a target species, Phaseolus vulgaris, which only had partial sequence data available, and a closely related species, Glycine max, which has a fully se- quenced genome. On sequence similarity alone, which is the standard technique for detecting or- thologs, 205 potential orthologs were detected. DOGMA then filtered these results using mapping data from each species to determine that 121 of the 205 were quite likely true orthologs, referred to as putative orthologs, and the remaining 84 were categorized as reduced orthologs as there was either insufficient information present or were clearly outside a noted region of shared synteny. This provides evidence that DOGMA is capable of reducing false positives versus traditional techniques, such as applications based on Reciprocal Best BLAST Hits. If we interpret the output of the Or- tholuge program as the correct answer, DOGMA achieves 95% sensitivity. However, it is possible that some of the reduced orthologs classified by DOGMA are actually Ortholuge’s false positives, since DOGMA is using mapping data. To support this idea, we show DOGMA’s ability to detect false positives in the results of Ortholuge by artificially creating a paralog and removing the real ortholog. DOGMA properly classifies this data as opposed to Ortholuge.
Ortholog, ortholog detection, shared synteny, DOGMA, genetic mapping
Master of Science (M.Sc.)