Sparse Bayesian learning for predicting phenotypes and identifying influential markers

dc.contributor.authorAyat, Maryam
dc.contributor.examiningcommitteeTremblay-Savard, Olivier (Computer Science)en_US
dc.contributor.examiningcommitteeAcar, Elif (Statistics)en_US
dc.contributor.examiningcommitteeButz, Cory (University of Regina)en_US
dc.contributor.supervisorDomaratzki, Michael (Computer Science)en_US
dc.date.accessioned2019-01-08T14:56:34Z
dc.date.available2019-01-08T14:56:34Z
dc.date.issued2018-12en_US
dc.date.submitted2018-12-24T06:29:37Zen
dc.degree.disciplineComputer Scienceen_US
dc.degree.levelDoctor of Philosophy (Ph.D.)en_US
dc.description.abstractIn bioinformatics, Genomic Selection (GS) and Genome-Wide Association Studies (GWASs) are two related problems that can be applied to the plant breeding industry. GS is a method to predict phenotypes (i.e., traits) such as yield and disease resistance in crops from high-density markers positioned throughout the genome of the varieties. By contrast, a GWAS involves identifying markers or genes that underlie the phenotypes of importance in breeding. The need to accelerate the development of improved varieties, and challenges such as discovering all sorts of genetic factors related to a trait, increasingly persuade researchers to apply state-of-the-art machine learning methods to GS and GWASs. The aim of this study is to employ sparse Bayesian learning as a technique for GS and GWAS. The sparse Bayesian learning uses Bayesian inference to obtain sparse solutions in regression or classification problems. This learning method is also called the Relevance Vector Machine (RVM), as it can be viewed as a kernel-based model of identical form to the renowned Support Vector Machine (SVM) method. The RVM has some advantages that the SVM lacks, such as having probabilistic outputs, providing a much sparser model, and the ability to work with arbitrary kernel functions. However, despite the advantages, there is not enough research on the applicability of the RVM. In this thesis, we define and explore two different forms of the sparse Bayesian learning for predicting phenotypes and identifying the most influential markers of a trait, respectively. Particularly, we introduce a new framework based on sparse Bayesian learning and ensemble technique for ranking influential markers of a trait. We apply our methods on three different datasets, one simulated dataset and two real-world datasets (yeast and flax), and analyze our results with respect to the existing related works, trait heritability, and the accuracies obtained from the use of different kernel functions including linear, Gaussian, and string kernels, if applicable. We find that the RVMs can not only be considered as good as other successful machine learning methods in phenotype prediction, but are also capable of identifying the most important markers from which biologists might gain insight.en_US
dc.description.noteFebruary 2019en_US
dc.identifier.urihttp://hdl.handle.net/1993/33640
dc.language.isoengen_US
dc.rightsopen accessen_US
dc.subjectSparse Bayesian Learningen_US
dc.subjectRelevance Vector Machineen_US
dc.subjectPhenotype predictionen_US
dc.subjectRanking featuresen_US
dc.subjectMarker identificationen_US
dc.titleSparse Bayesian learning for predicting phenotypes and identifying influential markersen_US
dc.typedoctoral thesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ayat_maryam.pdf
Size:
1.42 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.2 KB
Format:
Item-specific license agreed to upon submission
Description: