Subgenome Inference in polyploids: Insights from Synteny-based Linkage
Date
2023-09-22
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ORCID
0000-0002-2579-0470
Type
Thesis
Degree Level
Masters
Abstract
Polyploidy is a common occurrence in flowering plants, where whole genome duplication or triplication
events result in additional sets of chromosomes (or subgenomes). Almost all flowering plants have undergone at
least one polyploid event, with some experiencing multiple events since their ancestral angiosperm. Polyploidy
leads to subgenomes with redundant gene copies that are rapidly lost through gene fractionation. Synteny
blocks can aid in inferring the evolutionary footprint of the genomes. However, assigning synteny blocks
to different subgenomes representing subsets of the organism’s genome is challenging due to recurring
polyploidization and fractionation events. These events complicate the situation by scrambling gene order on
a background of evolutionary processes such as gene family expansion, gene loss, and genome rearrangement.
Existing methods for subgenome identification require manually organizing genes into subgenomes, which is
laborious, prone to error, and requires expertise. To the best of our knowledge, an automated subgenome
reconstruction method does not exist. To address this challenge, we developed the SyntenyLink algorithm
that automatically reconstructs subgenomes from synteny blocks. The algorithm considers differences shown
in substitution and fractionation patterns in synteny blocks, as well as continuity of conserved order of
genes to reconstruct the most parsimonious subgenomes. The algorithm first utilizes the BLASTP and
DAGchainer programs to identify synteny blocks across different chromosomes of two related genomes. It then
organizes the blocks into subgenomes using depth-first search on a weighted graph where the vertices in the
graph represent super synteny blocks identified by translocation breakpoints. The graph edges are weighted
using the combined information of percent identity, block chain, and gene density between the two vertices
connected by the edge. The algorithm then minimizes the number of translocation events by using a maximum
neighborhood method. The SyntenyLink algorithm was validated using published subgenomes of Brassica
rapa, Brassica oleracea, Brassica nigra and manually curated subgenomes of Brassica napus, and Sinapis
alba. The results provide compelling evidence for the efficacy of the SyntenyLink algorithm in accurately
reconstructing subgenomes. The algorithm demonstrated favorable accuracy in placing genes to subgenomes
overall, especially with subgenome1, which achieved accuracy of 87% in B. rapa, 80% in B. oleracea, 79% in
B. nigra, 83% in B. napus and 86% in S. alba. The results revealed relatively lower accuracy (60% to 85%)
for subgenome2 and subgenome3 in all five species, largely due to the highly similar fractionation patterns
exhibited in these two subgenomes and the wide-spread segmental gene duplications, posing a challenge in
accurately distinguishing genes belonging to each. SyntenyLink was then applied to separate the subgenomes
of two Brassica species, Brassica juncea and Brassica carinata. This algorithm represents a promising tool for
reconstructing subgenomes from complex polyploid genomes, with far-reaching implications for the study of
the evolutionary history of flowering plants and other polyploid organisms.
Description
Keywords
Subgenomes, Synteny, Fractionation, Polyploidy, Dynamic Linking, Maximum Neighbourhood
Citation
Degree
Master of Science (M.Sc.)
Department
Computer Science
Program
Computer Science
Advisor
Jin, Lingling
Parkin, Isobel
Parkin, Isobel
Committee
McQuillan, Ian;Sharpe, Andrew