Scaling a convolutional neural network based Flower counting application in a distributed GPU cluster
Chowdhury, Mohammed Rashid 1991-
MetadataShow full item record
Taking advantage of modern data acquisition techniques, the researchers of P2IRC located at the University of Saskatchewan developed an application to monitor the status of the flower growth during different phases of the blooming period and the yield prediction of canola crops. Though the application could predict the near accurate number of flowers in a few scenarios, its inability to function under challenging situations such as misinterpreting sun reflection or dust along the roadside as flowers have motivated the researchers to find an alternative approach of counting flowers. In addition to being a more accurate version, another goal is for the new application to be faster to infer the number of flowers and scalable in distributed environments. Putting these goals in mind, in this thesis, a Convolutional neural network (CNN) based flower counting application is developed and evaluated taking inspiration from two other previous works where CNN was used for counting heads in dense crowds and predicting the number of bacterial cells from medical imagery. In addition to that, the application addresses the performance and the accuracy goals previously mentioned. Two challenges of using the neural network are (a) the training needs a large volume of data to converge to a low error and (b) the training is computationally expensive and it takes longer time to complete. To address the first challenge, experiments were run with both "ground truth" estimated using a modified version of the previous flower counter, and ground truth from manual annotation. To address the problem of long training time, two distributed versions of the proposed application were created based on two different distributed architectures called Parameter Server and Ring-AllReduce. Moreover, a detailed explanation of the proposed CNN's architecture along with its memory footprints and GPU utilization is also organized as an in-depth case study to help trace the model's memory consumption during training. From different sets of experiments, the new flower counter application is observed more accurate than its previous version and both implementations of its distributed versions successfully reduced the total completion time as a result of being linearly scalable when more workers are added to run the training. The Ring-AllReduce version performed slightly better than the Parameter Server, but the differences were not substantial.
DegreeMaster of Science (M.Sc.)
SupervisorMakaroff, Dwight; Eager, Derek
CommitteeHorsch, Michael; Links, Matthew
Copyright DateJune 2019
Distributed GPU Computing, Convolutional Neural Network, Flower Counter, Distributed TensorFlow, Horovod, Parameter Server, Ring-AllReduce.