Repository logo
 

Unraveling Acr-mediated Deactivation of CRISPR-Cas Systems: A Transformer Approach

Date

2023-09-26

Journal Title

Journal ISSN

Volume Title

Publisher

ORCID

0009-0009-4117-4289

Type

Thesis

Degree Level

Masters

Abstract

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR-associated (Cas) serve as a formidable defense mechanism for bacteria against foreign DNA; on the other hand, some bacterio- phages (phages) and mobile genetic elements have evolved anti-CRISPR (Acr) proteins to counteract CRISPR- Cas systems and ensure their own survival. Because Acr proteins provide phages with a fitness advantage relative to the bacteria that they infect, accurately identifying Acr proteins that inhibit CRISPR-Cas systems has the potential to significantly and positively impact our ability to harness phages to fight antimicrobial resistance. However, Acr identification is, at present, laborious and involves costly experimental procedures. Existing computational tools for protein-protein interaction (PPI) are not designed to predict complex inhibi- tion, which could be the collective result of multiple complex PPIs. In this study, we developed a transformer- based deep neural network, AcrTransAct, to predict the likelihood of Acr-mediated CRISPR-Cas inhibition. Our model comprises two main components: 1. a feature extraction module that incorporates a pre-trained Evolutionary Scale Modeling (ESM) protein transformer and the NetSurfP-3.0 secondary structure predic- tion system; 2. a classification module that consists of either a convolutional or recurrent neural network. We created an inhibition dataset compiled from two Acr databases, AcrHub [69] and Anti-CRISPRdb [13], and several published works [21, 48, 45, 36]. The AcrTransAct model is trained and tested on this dataset. We achieved an accuracy of 95% and an F1 score of 0.95 in predicting the inhibition of I-C, I-E, and I-F CRISPR-Cas systems by Acrs. We evaluate our classifier’s performance by using four different feature sets: amino acid sequences, structural features, ESM features, and a combination of ESM and structural features. Our work provides a valuable tool for predicting interactions between Acrs and CRISPR-Cas systems and facilitates experimental Acr activity experiments by selecting the most likely Acr from many homologous candidate proteins. Furthermore, we provide insights into the capabilities of transformer networks in biolog- ical sequence analysis tasks, especially in the context of protein-protein interactions. A web application of AcrTransAct (https://acrtransact.usask.ca) is implemented with the best-performing models from this study to predict the probability of multiple CRISPR-Cas systems inhibited by a putative Acr protein. Our code and data are available here: https://github.com/USask-BINFO/AcrTransAct.

Description

Keywords

CRISPR-Cas, Anti-CRISPR, Transformers, Deep Learning, Large Language Models, Protein Inhibition

Citation

Degree

Master of Science (M.Sc.)

Department

Computer Science

Program

Computer Science

Advisor

Part Of

item.page.relation.ispartofseries

DOI

item.page.identifier.pmid

item.page.identifier.pmcid