Repository logo
 

Accelerating the Smith–Waterman Algorithm with the Actor Model

Date

2025-06-02

Journal Title

Journal ISSN

Volume Title

Publisher

ORCID

Type

Thesis

Degree Level

Masters

Abstract

The Smith--Waterman algorithm is a fundamental tool for local sequence alignment in computational biology, but its high computational complexity poses significant challenges for large-scale datasets. This thesis investigates the parallelization of the Smith--Waterman algorithm. It introduces inter- and intra-alignment parallelization along with a bi-level parallelization strategy combining inter- and intra-alignment tasks is introduced, leveraging the actor model, OpenMP, and MPI frameworks. The actor model demonstrated superior performance for inter-alignment tasks, achieving near-linear scalability, with speedups of up to 28× on 40 cores for the BRCA1 dataset, outperforming OpenMP (15.14×) and MPI (14×). OpenMP proved most efficient for intra-alignment tasks, completing sequence alignment in 4.91 seconds for sequences of 51,000 and 53,000 nucleotides, while the actor model followed closely at 6.42 seconds. The bi-level parallelization approach, particularly the Actor-Serial configuration, consistently outperformed state-of-the-art tools, including SeqAN, Parasail, and SWIPE, in terms of wall-clock time and scalability. For instance, Actor-Serial completed the BRCA1 task 1.83× faster than Parasail, 3.42× faster than SeqAN, and 4.32× faster than SWIPE. In distributed environments, Actor-Serial demonstrated exceptional scalability, achieving speedups of 77.56× and 78.05× for the BRCA1 and BRCA2 datasets at 160 CPUs, respectively, far exceeding SWIPE's 16.80× and 17.49× speedups under the same conditions. This research also highlights trade-offs between computational speed and resource usage, with the actor model requiring additional memory despite its minimal overhead in task management. The main contribution of this work is the demonstration of the Actor-Serial model as a scalable and efficient solution for parallelizing the Smith--Waterman algorithm, particularly for large datasets and distributed computing environments. Overall, the thesis underscores the effectiveness of the actor model for managing computational tasks due to its minimal overhead, the exceptional performance of OpenMP for intra-alignment parallelization, and the Optimal performance of the Actor-Serial configuration for bi-level parallelization in both single and distributed nodes.

Description

Keywords

Sequence Alignment, Smith--Waterman Algorithm, Actor Model, Parallel Computing

Citation

Degree

Master of Science (M.Sc.)

Department

Computer Science

Program

Computer Science

Part Of

item.page.relation.ispartofseries

DOI

item.page.identifier.pmid

item.page.identifier.pmcid