University of SaskatchewanHARVEST
  • Login
  • Submit Your Work
  • About
    • About HARVEST
    • Guidelines
    • Browse
      • All of HARVEST
      • Communities & Collections
      • By Issue Date
      • Authors
      • Titles
      • Subjects
      • This Collection
      • By Issue Date
      • Authors
      • Titles
      • Subjects
    • My Account
      • Login
      JavaScript is disabled for your browser. Some features of this site may not work without it.
      View Item 
      • HARVEST
      • Electronic Theses and Dissertations
      • Graduate Theses and Dissertations
      • View Item
      • HARVEST
      • Electronic Theses and Dissertations
      • Graduate Theses and Dissertations
      • View Item

      Influence of training dataset selection on the performance of a machine learning model

      Thumbnail
      View/Open
      MOULI-THESIS-2021.pdf (13.94Mb)
      Date
      2022-04-05
      Author
      Mouli, Srishti
      ORCID
      0000-0001-6049-0230
      Type
      Thesis
      Degree Level
      Masters
      Metadata
      Show full item record
      Abstract
      To observe the growth dynamics of the canola flowers during the blooming season and estimate the harvest forecast of the Canola crops, an application called ‘Flower Counter’ has been developed by the researchers of P2IRC located at the University of Saskatchewan. The model has been developed using Deep Learning (DL) based Multi-column Convolutional Neural Network (MCNN) algorithm and TensorFlow framework. This is an object counting model, that counts the Canola flowers from the images based on the learning from a given set of training images, called ‘ground-truths’. This work proposes to compose a good training dataset that would give good accuracy with a robust object detection model by using different training and testing combinations. Various evaluation techniques have been used in this work to check the impact of the training dataset, on the testing results of the model and generalizability. The primary goal of this research work is to define a good training dataset composition having diversity. A good composition also consists of different characteristics present in the dataset, that can impact the testing results and can help in creating a robust object counting model. Different characteristics of the training datasets and testing datasets are used to evaluate the most prominent characteristics and features that impact the test results. The objective is also to evaluate the impact of training dataset selection on testing results produced by the ML model in terms of accuracy. This work would help the researchers and plant scientists gain knowledge about the diversity of characteristics for the composition of a training dataset. This can give insights to reduce the manual effort which is required to create ground truth for training models by identifying the characteristics that impact testing results. Since the entire training of the model depends on the datasets collected during diverse weather conditions, there could be factors that could impact some of the experimental results. The research area for training dataset selection has not been explored much, and this research work will give good insights about model generalization capability and scopes for manual work utilization for getting a robust object counting model.
      Degree
      Master of Science (M.Sc.)
      Department
      Computer Science
      Program
      Computer Science
      Supervisor
      Makaroff, Dwight; Eager, Derek
      Committee
      Stavness, Ian; Keil, Mark; Nguyen, Ha
      Copyright Date
      November 2021
      URI
      https://hdl.handle.net/10388/13864
      Subject
      Machine Learning, Training Dataset selection
      Collections
      • Graduate Theses and Dissertations
      University of Saskatchewan

      University Library

      The University of Saskatchewan's main campus is situated on Treaty 6 Territory and the Homeland of the Métis.

      © University of Saskatchewan
      Contact Us | Disclaimer | Privacy