Repository logo
 

Epitope-TCR Interaction Prediction with Deep Learning based on Sequence and Physicochemical Properties

dc.contributor.advisorWu, FangXiang
dc.contributor.committeeMemberLiu, Qiang
dc.contributor.committeeMemberMcQuillan, Ian
dc.creatorRaha, Rawshon
dc.creator.orcid0009-0004-9405-6052
dc.date.accessioned2024-06-19T22:19:52Z
dc.date.available2024-06-19T22:19:52Z
dc.date.copyright2024
dc.date.created2024-06
dc.date.issued2024-06-19
dc.date.submittedJune 2024
dc.date.updated2024-06-19T22:19:52Z
dc.description.abstractImmune system cells are capable of defending our body from attack by a pathogen if they succeed to recognize the pathogen as a threat before its attack. The recognition of chewed up fragments of the antigen (epitope) by immune system cells (TCR) can be predicted by successful epitope-TCR recognition. However, testing numerous epitope-TCR sequences experimentally for interaction is very time and resource consuming. Predicting this interaction computationally before testing them in the laboratory can help with effective vaccination and personalized healthcare. In this study, I addressed the interaction prediction task in the unseen epitope setting by developing a pairwise combination based model, and in the unseen TCR setting by developing an ensemble learning model with sequence based calculations. In the pairwise combination based model for unseen epitope-TCR interaction prediction, the pairwise epitope and TCR sequences are used simultaneously to generate images like features using absolute difference and vector outer product of constituent amino acid's physicochemical properties. The best performing physicochemical properties have been selected and found to exhibit much higher performance in comparison to the existing unseen epitope prediction models. The absolute difference based model produced an AUC of 0.64 with only two best performing physicochemical properties, namely, Hydrophobicity and Net Charge Index. The vector outer product based model produced an AUC of 0.60 with the same two properties. Furthermore, the model achieved an AUC of 0.82 by combining both types of features while the best competing model had an AUC of 0.55 for similar setting and dataset. In the ensemble learning model for predicting unseen TCR-epitope interactions, the features were generated using physicochemical property vector, one hot vector, and ProtBERT embedding vector. During the model training, the equally-long sequences were created by zero padding and a masking strategy is adopted to mitigate the noises which may have been introduced by the zero padding. The best performing models using physicochemical property vector, one hot vector, and ProtBERT embedding vector achieved AUC values of 0.74, 0.78 and 0.77, respectively. Moreover, the ensemble learning model based on the individually predicted posterior probabilities achieved an AUC of 0.79, which is convincingly better than the existing best performing methods.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/10388/15774
dc.language.isoen
dc.subjectEpitope-TCR Interaction
dc.subjectAbsolute Difference
dc.subjectVector Outer Product
dc.subjectPhysicochemical Properties
dc.subjectOne Hot Vector
dc.subjectProtBERT Embedding
dc.subjectBidirectional Long Short Term Memory (BiLSTM)
dc.titleEpitope-TCR Interaction Prediction with Deep Learning based on Sequence and Physicochemical Properties
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentBiomedical Engineering
thesis.degree.disciplineBiomedical Engineering
thesis.degree.grantorUniversity of Saskatchewan
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.Sc.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
RAHA-THESIS-2024.pdf
Size:
3.65 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.27 KB
Format:
Plain Text
Description: