Repository logo
 

Epitope-TCR Interaction Prediction with Deep Learning based on Sequence and Physicochemical Properties

Date

2024-06-19

Journal Title

Journal ISSN

Volume Title

Publisher

ORCID

0009-0004-9405-6052

Type

Thesis

Degree Level

Masters

Abstract

Immune system cells are capable of defending our body from attack by a pathogen if they succeed to recognize the pathogen as a threat before its attack. The recognition of chewed up fragments of the antigen (epitope) by immune system cells (TCR) can be predicted by successful epitope-TCR recognition. However, testing numerous epitope-TCR sequences experimentally for interaction is very time and resource consuming. Predicting this interaction computationally before testing them in the laboratory can help with effective vaccination and personalized healthcare. In this study, I addressed the interaction prediction task in the unseen epitope setting by developing a pairwise combination based model, and in the unseen TCR setting by developing an ensemble learning model with sequence based calculations. In the pairwise combination based model for unseen epitope-TCR interaction prediction, the pairwise epitope and TCR sequences are used simultaneously to generate images like features using absolute difference and vector outer product of constituent amino acid's physicochemical properties. The best performing physicochemical properties have been selected and found to exhibit much higher performance in comparison to the existing unseen epitope prediction models. The absolute difference based model produced an AUC of 0.64 with only two best performing physicochemical properties, namely, Hydrophobicity and Net Charge Index. The vector outer product based model produced an AUC of 0.60 with the same two properties. Furthermore, the model achieved an AUC of 0.82 by combining both types of features while the best competing model had an AUC of 0.55 for similar setting and dataset. In the ensemble learning model for predicting unseen TCR-epitope interactions, the features were generated using physicochemical property vector, one hot vector, and ProtBERT embedding vector. During the model training, the equally-long sequences were created by zero padding and a masking strategy is adopted to mitigate the noises which may have been introduced by the zero padding. The best performing models using physicochemical property vector, one hot vector, and ProtBERT embedding vector achieved AUC values of 0.74, 0.78 and 0.77, respectively. Moreover, the ensemble learning model based on the individually predicted posterior probabilities achieved an AUC of 0.79, which is convincingly better than the existing best performing methods.

Description

Keywords

Epitope-TCR Interaction, Absolute Difference, Vector Outer Product, Physicochemical Properties, One Hot Vector, ProtBERT Embedding, Bidirectional Long Short Term Memory (BiLSTM)

Citation

Degree

Master of Science (M.Sc.)

Department

Biomedical Engineering

Program

Biomedical Engineering

Citation

Part Of

item.page.relation.ispartofseries

DOI

item.page.identifier.pmid

item.page.identifier.pmcid