University of SaskatchewanHARVEST
  • Login
  • Submit Your Work
  • About
    • About HARVEST
    • Guidelines
    • Browse
      • All of HARVEST
      • Communities & Collections
      • By Issue Date
      • Authors
      • Titles
      • Subjects
      • This Collection
      • By Issue Date
      • Authors
      • Titles
      • Subjects
    • My Account
      • Login
      JavaScript is disabled for your browser. Some features of this site may not work without it.
      View Item 
      • HARVEST
      • Electronic Theses and Dissertations
      • Graduate Theses and Dissertations
      • View Item
      • HARVEST
      • Electronic Theses and Dissertations
      • Graduate Theses and Dissertations
      • View Item

      A Modular Data Analytic Pipeline for Feature Selection in High Dimensional Microbial Data Sets

      Thumbnail
      View/Open
      REDLICK-THESIS-2020.pdf (2.174Mb)
      Ellen Redlick Thesis 2020-updated.docx (6.445Mb)
      Date
      2021-03-16
      Author
      Redlick, Ellen
      ORCID
      0000-0003-1431-5516
      Type
      Thesis
      Degree Level
      Masters
      Metadata
      Show full item record
      Abstract
      The demand on the global food supply is ever increasing. With a finite amount of land to grow crops, soil health is crucial to ensuring a continued reliable food supply. Understanding how soil microbiomes affect plant growth has proven difficult in part because of the sheer number of microbes per gram of soil. This challenge is akin to the “large p, small n” problem in statistics. We have proposed a pipeline to analyze data of this nature with the help of network analysis. Networks, which are commonly referred to in computer science as graphs, are sets of nodes and edges. For the experiments in this thesis, the nodes represent microbes and edges represent their relationships with one another. These relationships are determined by calculating pairwise correlations on the data set. The data used to test the pipeline is an Operational Taxonomic Unit (OTU) abundance table, where columns are OTUs and rows are the samples. Four types of network centralities have been implemented and are used to measure the “importance” of a microbe. Each of these centralities have different interpretations for how to quantify importance. A sensitivity analysis was performed on a smooth brome invasion dataset using the pipeline. This analysis explored the implications of varying the pipeline parameters, with respect to performance and result consistency. The trade-offs of the parameters are discussed as it is recognized that different users may value different features. This pipeline has been used as part of an application that successfully detected microbes that responded to externalities regardless of abundance.
      Degree
      Master of Science (M.Sc.)
      Department
      Computer Science
      Program
      Computer Science
      Supervisor
      Stanley, Kevin
      Committee
      Kusalik, Anthony; Horsch, Michael; Siciliano, Steven; Arcand, Melissa
      Copyright Date
      October 2020
      URI
      http://hdl.handle.net/10388/13284
      Subject
      feature selection
      data analysis
      microbial data sets
      Collections
      • Graduate Theses and Dissertations
      University of Saskatchewan

      University Library

      The University of Saskatchewan's main campus is situated on Treaty 6 Territory and the Homeland of the Métis.

      © University of Saskatchewan
      Contact Us | Disclaimer | Privacy