Repository logo
 

SV-JIM, detailed pairwise structural variant calling using long-reads and genome assemblies

Date

0001

Authors

Todd, Clarence
Jin, Lingling
McQuillan, Ian

Journal Title

Journal ISSN

Volume Title

Publisher

Methods

ORCID

Type

Article

Degree Level

Abstract

This paper proposes a detailed process for SV calling that permits a data-driven assessment of multiple SV callers that uses both genome assemblies and long-reads. The process is implemented as a software pipeline named Structural Variant − Jaccard Index Measure, or SVJIM, using the Snakemake [20] workflow management system. Like most state-of-the-art SV callers, SV-JIM detects the presence of variations between pairs of genomes, but it streamlines the numerous SV calling stages into a single process for user convenience and evaluates the multiple SV sets produced using the Jaccard index measure to identify those with the highest consistency among the included SV callers. SV-JIM then produces aggregated SV results based on how many callers supported the reported SVs. For validation, SV-JIM was assessed through three case studies on the Homo sapiens genome and two plant genomes – Brassica nigra and Arabidopsis thaliana. Executing SV-JIM identified a significant amount of inter-caller variance which varied by tens of thousands of results on the larger Brassica nigra and Homo sapiens genomes. Further, aggregating the SV sets helped simplify better retention of the less frequently occurring SV types by requiring a level of minimum support rather than from a specific SV caller combination. Finally, these case studies identified a potential for inflated precision reporting that can occur during evaluation. SV-JIM is available publicly under MIT license at https://github.com/USask-BINFO/SV-JIM

Description

Keywords

Structural variant calling, Genetic variation, Comparative genomics

Citation

Degree

Department

Program

Advisor

Committee

Part Of

item.page.relation.ispartofseries

DOI

https://doi.org/10.1016/j.ymeth.2024.12.015

item.page.identifier.pmid

item.page.identifier.pmcid