Repository logo
 

Influence of training dataset selection on the performance of a machine learning model

Date

2022-04-05

Journal Title

Journal ISSN

Volume Title

Publisher

ORCID

0000-0001-6049-0230

Type

Thesis

Degree Level

Masters

Abstract

To observe the growth dynamics of the canola flowers during the blooming season and estimate the harvest forecast of the Canola crops, an application called ‘Flower Counter’ has been developed by the researchers of P2IRC located at the University of Saskatchewan. The model has been developed using Deep Learning (DL) based Multi-column Convolutional Neural Network (MCNN) algorithm and TensorFlow framework. This is an object counting model, that counts the Canola flowers from the images based on the learning from a given set of training images, called ‘ground-truths’. This work proposes to compose a good training dataset that would give good accuracy with a robust object detection model by using different training and testing combinations. Various evaluation techniques have been used in this work to check the impact of the training dataset, on the testing results of the model and generalizability. The primary goal of this research work is to define a good training dataset composition having diversity. A good composition also consists of different characteristics present in the dataset, that can impact the testing results and can help in creating a robust object counting model. Different characteristics of the training datasets and testing datasets are used to evaluate the most prominent characteristics and features that impact the test results. The objective is also to evaluate the impact of training dataset selection on testing results produced by the ML model in terms of accuracy. This work would help the researchers and plant scientists gain knowledge about the diversity of characteristics for the composition of a training dataset. This can give insights to reduce the manual effort which is required to create ground truth for training models by identifying the characteristics that impact testing results. Since the entire training of the model depends on the datasets collected during diverse weather conditions, there could be factors that could impact some of the experimental results. The research area for training dataset selection has not been explored much, and this research work will give good insights about model generalization capability and scopes for manual work utilization for getting a robust object counting model.

Description

Keywords

Machine Learning, Training Dataset selection

Citation

Degree

Master of Science (M.Sc.)

Department

Computer Science

Program

Computer Science

Citation

Part Of

item.page.relation.ispartofseries

DOI

item.page.identifier.pmid

item.page.identifier.pmcid