Sparse Bayesian learning for predicting phenotypes and identifying influential markers

Ayat, Maryam

Sparse Bayesian learning for predicting phenotypes and identifying influential markers

dc.contributor.author	Ayat, Maryam
dc.contributor.examiningcommittee	Tremblay-Savard, Olivier (Computer Science)	en_US
dc.contributor.examiningcommittee	Acar, Elif (Statistics)	en_US
dc.contributor.examiningcommittee	Butz, Cory (University of Regina)	en_US
dc.contributor.supervisor	Domaratzki, Michael (Computer Science)	en_US
dc.date.accessioned	2019-01-08T14:56:34Z
dc.date.available	2019-01-08T14:56:34Z
dc.date.issued	2018-12	en_US
dc.date.submitted	2018-12-24T06:29:37Z	en
dc.degree.discipline	Computer Science	en_US
dc.degree.level	Doctor of Philosophy (Ph.D.)	en_US
dc.description.abstract	In bioinformatics, Genomic Selection (GS) and Genome-Wide Association Studies (GWASs) are two related problems that can be applied to the plant breeding industry. GS is a method to predict phenotypes (i.e., traits) such as yield and disease resistance in crops from high-density markers positioned throughout the genome of the varieties. By contrast, a GWAS involves identifying markers or genes that underlie the phenotypes of importance in breeding. The need to accelerate the development of improved varieties, and challenges such as discovering all sorts of genetic factors related to a trait, increasingly persuade researchers to apply state-of-the-art machine learning methods to GS and GWASs. The aim of this study is to employ sparse Bayesian learning as a technique for GS and GWAS. The sparse Bayesian learning uses Bayesian inference to obtain sparse solutions in regression or classification problems. This learning method is also called the Relevance Vector Machine (RVM), as it can be viewed as a kernel-based model of identical form to the renowned Support Vector Machine (SVM) method. The RVM has some advantages that the SVM lacks, such as having probabilistic outputs, providing a much sparser model, and the ability to work with arbitrary kernel functions. However, despite the advantages, there is not enough research on the applicability of the RVM. In this thesis, we define and explore two different forms of the sparse Bayesian learning for predicting phenotypes and identifying the most influential markers of a trait, respectively. Particularly, we introduce a new framework based on sparse Bayesian learning and ensemble technique for ranking influential markers of a trait. We apply our methods on three different datasets, one simulated dataset and two real-world datasets (yeast and flax), and analyze our results with respect to the existing related works, trait heritability, and the accuracies obtained from the use of different kernel functions including linear, Gaussian, and string kernels, if applicable. We find that the RVMs can not only be considered as good as other successful machine learning methods in phenotype prediction, but are also capable of identifying the most important markers from which biologists might gain insight.	en_US
dc.description.note	February 2019	en_US
dc.identifier.uri	http://hdl.handle.net/1993/33640
dc.language.iso	eng	en_US
dc.rights	open access	en_US
dc.subject	Sparse Bayesian Learning	en_US
dc.subject	Relevance Vector Machine	en_US
dc.subject	Phenotype prediction	en_US
dc.subject	Ranking features	en_US
dc.subject	Marker identification	en_US
dc.title	Sparse Bayesian learning for predicting phenotypes and identifying influential markers	en_US
dc.type	doctoral thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ayat_maryam.pdf
Size:: 1.42 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.2 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

FGS - Electronic Theses and Practica