Repository logo
 

Subgenome Inference in polyploids: Insights from Synteny-based Linkage

Date

2023-09-22

Journal Title

Journal ISSN

Volume Title

Publisher

ORCID

0000-0002-2579-0470

Type

Thesis

Degree Level

Masters

Abstract

Polyploidy is a common occurrence in flowering plants, where whole genome duplication or triplication events result in additional sets of chromosomes (or subgenomes). Almost all flowering plants have undergone at least one polyploid event, with some experiencing multiple events since their ancestral angiosperm. Polyploidy leads to subgenomes with redundant gene copies that are rapidly lost through gene fractionation. Synteny blocks can aid in inferring the evolutionary footprint of the genomes. However, assigning synteny blocks to different subgenomes representing subsets of the organism’s genome is challenging due to recurring polyploidization and fractionation events. These events complicate the situation by scrambling gene order on a background of evolutionary processes such as gene family expansion, gene loss, and genome rearrangement. Existing methods for subgenome identification require manually organizing genes into subgenomes, which is laborious, prone to error, and requires expertise. To the best of our knowledge, an automated subgenome reconstruction method does not exist. To address this challenge, we developed the SyntenyLink algorithm that automatically reconstructs subgenomes from synteny blocks. The algorithm considers differences shown in substitution and fractionation patterns in synteny blocks, as well as continuity of conserved order of genes to reconstruct the most parsimonious subgenomes. The algorithm first utilizes the BLASTP and DAGchainer programs to identify synteny blocks across different chromosomes of two related genomes. It then organizes the blocks into subgenomes using depth-first search on a weighted graph where the vertices in the graph represent super synteny blocks identified by translocation breakpoints. The graph edges are weighted using the combined information of percent identity, block chain, and gene density between the two vertices connected by the edge. The algorithm then minimizes the number of translocation events by using a maximum neighborhood method. The SyntenyLink algorithm was validated using published subgenomes of Brassica rapa, Brassica oleracea, Brassica nigra and manually curated subgenomes of Brassica napus, and Sinapis alba. The results provide compelling evidence for the efficacy of the SyntenyLink algorithm in accurately reconstructing subgenomes. The algorithm demonstrated favorable accuracy in placing genes to subgenomes overall, especially with subgenome1, which achieved accuracy of 87% in B. rapa, 80% in B. oleracea, 79% in B. nigra, 83% in B. napus and 86% in S. alba. The results revealed relatively lower accuracy (60% to 85%) for subgenome2 and subgenome3 in all five species, largely due to the highly similar fractionation patterns exhibited in these two subgenomes and the wide-spread segmental gene duplications, posing a challenge in accurately distinguishing genes belonging to each. SyntenyLink was then applied to separate the subgenomes of two Brassica species, Brassica juncea and Brassica carinata. This algorithm represents a promising tool for reconstructing subgenomes from complex polyploid genomes, with far-reaching implications for the study of the evolutionary history of flowering plants and other polyploid organisms.

Description

Keywords

Subgenomes, Synteny, Fractionation, Polyploidy, Dynamic Linking, Maximum Neighbourhood

Citation

Degree

Master of Science (M.Sc.)

Department

Computer Science

Program

Computer Science

Advisor

Jin, Lingling
Parkin, Isobel

Committee

McQuillan, Ian;Sharpe, Andrew

Part Of

item.page.relation.ispartofseries

DOI

item.page.identifier.pmid

item.page.identifier.pmcid