Repository logo
 

Predicting Phenotypes From Novel Genomic Markers Using Deep Learning

dc.contributor.advisorJin, Lingling
dc.contributor.committeeMemberKusalik, Tony
dc.contributor.committeeMemberBett, Kirstin
dc.contributor.committeeMemberLiu, Juxin
dc.creatorSehrawat, Shivani
dc.creator.orcid0000-0003-0224-1820
dc.date.accessioned2022-12-21T21:55:38Z
dc.date.available2022-12-21T21:55:38Z
dc.date.copyright2022
dc.date.created2022-12
dc.date.issued2022-12-21
dc.date.submittedDecember 2022
dc.date.updated2022-12-21T21:55:39Z
dc.description.abstractGenomic selection (GS) is a powerful method concerned with predicting the phenotypes of individuals from genome-wide markers to select candidates for the next breeding cycle. Previous studies in GS have used single nucleotide polymorphism (SNP) markers to predict phenotypes using conventional statistical or deep learning models. However, these predictive models face challenges due to the high dimensionality of genome-wide SNP marker data and interactions between alleles. Thanks to recent breakthroughs in DNA sequencing and decreased sequencing cost, the study of novel genomic variants such as structural variations (SVs) and transposable elements (TEs) became increasingly prevalent. Here, we present a one-dimensional deep convolutional neural network, NovGMDeep, to predict phenotypes using novel genomic markers, such as SVs and TEs. The model is designed to use novel genomic markers to reduce the curse of dimensionality of the SNP genotypic data for GS. The proposed model is trained and tested on the samples of Arabidopsis thaliana and Oryza sativa using 3-fold cross-validation. The prediction accuracy is evaluated using Pearson’s Correlation Coefficient (PCC), Mean Absolute Error (MAE), and Standard Deviation (SD) of MAE on the testing sets. The predicted results showed a higher correlation when the model is trained with SVs and TEs than SNPs. NovGMDeep also has higher prediction accuracy when compared with conventional statistical models. We also included an extended study which describes sample size effects when the proposed model is trained on different number of samples for SVs. The results show better PCC values when the model was trained on more than 700 samples. This work sheds light on the unrecognized function of SVs and TEs in genotype-to-phenotype associations, as well as their extensive significance and value in crop development. Moreover, the predictions identified here using SVs and TEs will be useful to investigate the evolution and trait architecture of A. thaliana and O. sativa.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/10388/14390
dc.language.isoen
dc.subjectGenomic Selection, Deep Learning, Structural Variants, Transposable Elements, Computational Genomics.
dc.titlePredicting Phenotypes From Novel Genomic Markers Using Deep Learning
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputer Science
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Saskatchewan
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.Sc.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SEHRAWAT-THESIS-2022.pdf
Size:
1.4 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.27 KB
Format:
Plain Text
Description: