University of SaskatchewanHARVEST
  • Login
  • Submit Your Work
  • About
    • About HARVEST
    • Guidelines
    • Browse
      • All of HARVEST
      • Communities & Collections
      • By Issue Date
      • Authors
      • Titles
      • Subjects
      • This Collection
      • By Issue Date
      • Authors
      • Titles
      • Subjects
    • My Account
      • Login
      JavaScript is disabled for your browser. Some features of this site may not work without it.
      View Item 
      • HARVEST
      • Electronic Theses and Dissertations
      • Graduate Theses and Dissertations
      • View Item
      • HARVEST
      • Electronic Theses and Dissertations
      • Graduate Theses and Dissertations
      • View Item

      Applications of Machine Learning for Predicting Selection Outcomes in Antibody Phage Display

      Thumbnail
      View/Open
      HOGAN-THESIS-2016.pdf (6.881Mb)
      Date
      2016-09-22
      Author
      Hogan, Daniel
      Type
      Thesis
      Degree Level
      Masters
      Metadata
      Show full item record
      Abstract
      Antibodies form an essential component of the adaptive immune system, but they also have important scientific and clinical applications. These applications exploit the proven ability of antibodies to bind strongly and specifically to nearly any biomolecular target (e.g. protein) of interest. To produce antibodies for scientific and clinical applications, researchers can use a wet-lab technique called antibody phage display. Antibody phage display starts with a library of diverse antibody fragments and selects and amplifies those fragments that bind to the target. Antibody phage display combined with next-generation sequencing (NGS) technology has the potential to yield greater insight into the selection process. Machine learning is an area of artificial intelligence uniquely suited to recognizing patterns in large datasets, like those produced by NGS. The research goals of this thesis were to (1) train machine learning models to predict the selection of antibody fragments in antibody phage display using only the sequence of the fragment; (2) validate the ability of the trained models to generalize to different experiments; and (3) reverse engineer the trained models to gain greater insight into the learned patterns and the selection process. Antibody phage display data produced by the Geyer lab (University of Saskatchewan, SK) using two libraries called F and S was used to train a set of machine learning models: naive Bayes network (NB), linear model (LM), artificial neural network (ANN), support vector machine (SVM) with a radial basis function kernel (RBF-SVM), a SVM with a string kernel (SSK-SVM), and a random forest (RF). In addition, key parameters of the RBF- and SSK-SVM were tuned using a gridsearch. The trained models were then used to predict which antibody-displaying phage would be observed after the 5th round of panning, and their prediction accuracy on this data was used to help select models for subsequent analyses. The models selected were the RBF- and SSK-SVM. To achieve the second research goal, data originating from library F was used to train the two SVMs while library S data was used to test them. Finally, the two SVM models trained on library F were deconstructed to understand what features of the input correspond to negative predictions, and what features correspond to positive predictions. The ANN, SVMs, and RF models had the best average classification accuracy (81.5%), but of this group, there was not one classifier that performed significantly better than the others. These classifiers could be used to help non-experts select clones from either library F or S for further wet-lab analyses. The SVMs trained on library F and tested on library S achieved an average classification accuracy of 66.7%, significantly better than would be achieved by relying on chance. These two SVMs could be used to help non-experts select clones for further wet-lab analyses, provided the library being used is not too different from library S. Finally, deconstructing the SVMs trained on library F yielded insight into the basis for their predictions. The predictions of the RBF-SVM were found to be highly dependent on the molecular weight of the relevant binding region (i.e. CDRH3).
      Degree
      Master of Science (M.Sc.)
      Department
      Computer Science
      Program
      Computer Science
      Supervisor
      Kusalik, Anthony
      Committee
      McQuillan, Ian; Geyer, Clarence; Zhang, Chris
      Copyright Date
      September 2016
      URI
      http://hdl.handle.net/10388/7471
      Subject
      Bioinformatics
      Machine Learning
      Antibodies
      Antibody Phage Display
      Collections
      • Graduate Theses and Dissertations
      University of Saskatchewan

      University Library

      The University of Saskatchewan's main campus is situated on Treaty 6 Territory and the Homeland of the Métis.

      © University of Saskatchewan
      Contact Us | Disclaimer | Privacy