Deep learning for genomic selection

Thumbnail Image
Jubair, Sheikh
Journal Title
Journal ISSN
Volume Title
One of the significant challenges in the world is to feed its population, as around 193 million people are facing severe hunger. The solution to this problem is to increase the quality and quantity of food while facing the challenge of decreasing or degrading agricultural land. Genomic selection is a predictive technique to identify the top genotypes and develop new cultivars. It uses the whole genome molecular markers of a genotype to predict crop traits even before growing them. There are two main categories of genomic selection: i) single environment trial, where it is assumed that the environment does not impact the crop’s development, and ii) multi-environment trial, where the environment influences the crop development by interacting with the genetic component of crops known as GxE. Deep learning models can extract meaningful information from different data sources, such as weather and text data. However, they need to be better developed, especially for multi-environment trials. Here we devised one ensemble deep learning model and one transformer model for single environment trials, and three deep learning frameworks for multi-environment trials. While devising these models and frameworks, we introduced some new techniques for genomic selection, such as representing markers with genotype frequency, environment-specific markers and global markers that are not related to any environment but to a specific trait. The results demonstrate that our single environment models are competitive with or comparable to existing methods, while our multi-environment frameworks are better than some existing methods. We anticipate that this research will help future research build more complex deep learning frameworks for both categories. For example, future multi-environment frameworks may incorporate different data types, such as soil data and images of agricultural land. One of the proposed frameworks can be extended further to facilitate additional data and models.
genomic selection, multi-environment trial, deep learning, GxE, enviromics