Unraveling Acr-mediated Deactivation of CRISPR-Cas Systems: A Transformer Approach
Date
2023-09-26
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ORCID
0009-0009-4117-4289
Type
Thesis
Degree Level
Masters
Abstract
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR-associated (Cas)
serve as a formidable defense mechanism for bacteria against foreign DNA; on the other hand, some bacterio-
phages (phages) and mobile genetic elements have evolved anti-CRISPR (Acr) proteins to counteract CRISPR-
Cas systems and ensure their own survival. Because Acr proteins provide phages with a fitness advantage
relative to the bacteria that they infect, accurately identifying Acr proteins that inhibit CRISPR-Cas systems
has the potential to significantly and positively impact our ability to harness phages to fight antimicrobial
resistance. However, Acr identification is, at present, laborious and involves costly experimental procedures.
Existing computational tools for protein-protein interaction (PPI) are not designed to predict complex inhibi-
tion, which could be the collective result of multiple complex PPIs. In this study, we developed a transformer-
based deep neural network, AcrTransAct, to predict the likelihood of Acr-mediated CRISPR-Cas inhibition.
Our model comprises two main components: 1. a feature extraction module that incorporates a pre-trained
Evolutionary Scale Modeling (ESM) protein transformer and the NetSurfP-3.0 secondary structure predic-
tion system; 2. a classification module that consists of either a convolutional or recurrent neural network.
We created an inhibition dataset compiled from two Acr databases, AcrHub [69] and Anti-CRISPRdb [13],
and several published works [21, 48, 45, 36]. The AcrTransAct model is trained and tested on this dataset.
We achieved an accuracy of 95% and an F1 score of 0.95 in predicting the inhibition of I-C, I-E, and I-F
CRISPR-Cas systems by Acrs. We evaluate our classifier’s performance by using four different feature sets:
amino acid sequences, structural features, ESM features, and a combination of ESM and structural features.
Our work provides a valuable tool for predicting interactions between Acrs and CRISPR-Cas systems and
facilitates experimental Acr activity experiments by selecting the most likely Acr from many homologous
candidate proteins. Furthermore, we provide insights into the capabilities of transformer networks in biolog-
ical sequence analysis tasks, especially in the context of protein-protein interactions. A web application of
AcrTransAct (https://acrtransact.usask.ca) is implemented with the best-performing models from this
study to predict the probability of multiple CRISPR-Cas systems inhibited by a putative Acr protein. Our
code and data are available here: https://github.com/USask-BINFO/AcrTransAct.
Description
Keywords
CRISPR-Cas, Anti-CRISPR, Transformers, Deep Learning, Large Language Models, Protein Inhibition
Citation
Degree
Master of Science (M.Sc.)
Department
Computer Science
Program
Computer Science