Deep learning-enhanced drug discovery: innovative molecule clustering and interaction prediction through graph analysis

dc.contributor.authorHadipour, Hamid
dc.contributor.examiningcommitteeAkcora, Cuneyt (Computer Science)
dc.contributor.examiningcommitteeCardona, Silvia (Microbiology)
dc.contributor.supervisorHu, Pingzhao
dc.contributor.supervisorLeung, Carson
dc.date.accessioned2024-01-05T16:01:17Z
dc.date.available2024-01-05T16:01:17Z
dc.date.issued2023-12-21
dc.date.submitted2023-12-22T04:05:47Zen_US
dc.degree.disciplineComputer Scienceen_US
dc.degree.levelMaster of Science (M.Sc.)
dc.description.abstractMotivation The quest for efficient drug discovery processes necessitates a comprehensive approach that integrates molecular feature analysis with accurate compound-protein interaction (CPI) prediction. This study introduces models that combine deep learning (DL) techniques for intricate molecular feature engineering and innovative CPI prediction methods. This integration responds to the need for detailed molecular dataset analysis and the prediction of interactions between novel compounds and proteins, thereby enhancing drug discovery. Methods and Results Chapter 3 - Molecular Clustering and Feature Analysis: The framework implements a feature engineering scheme focusing on molecule-specific atomic and bonding information. It utilizes principal component analysis (PCA) for encoding this information and a variational autoencoder (VAE)-based method for embedding both global chemical properties and local features. This approach facilitated the clustering of a large dataset containing over 47,000 molecules. Using the K-means method with 32 embedding`s size based on the VAE method, 50 distinct molecular clusters were identified. These clusters were visualized through t-distributed Stochastic Neighbor Embedding (t-SNE), showcasing the framework's capability in effectively grouping molecules based on their complex features. Chapter 4 - CPI Prediction with GraphBAN: For CPI prediction, the study introduces GraphBAN, a novel inductive-based approach using graph knowledge distillation (KD). This component incorporates a deep bilinear attention network (BAN) and a KD module for graph analysis, enabling the alignment of interaction features across different distributions. GraphBAN's functionality extends to both transductive and inductive link predictions in a bi-partite graph of CPIs. Tested against three benchmark datasets, GraphBAN demonstrated superior performance, outperforming six baseline models. It shows that it is able to predict interactions between unseen compounds and proteins that is an important aspect of drug discovery. Conclusion This study presents two innovative models that parallelly analyze molecule-specific feature engineering and advanced CPI prediction techniques. By integrating these two key components, the models not only deepen the understanding of molecular characteristics but also significantly boost the accuracy of CPI predictions. This advancement is crucial for streamlining drug discovery processes, reducing the number of compounds needed for screening, and facilitating the development of more effective and targeted drugs.
dc.description.noteFebruary 2024
dc.identifier.urihttp://hdl.handle.net/1993/37931
dc.language.isoeng
dc.rightsopen accessen_US
dc.subjectdeep learning
dc.subjectdrug discovery
dc.subjectgraph neural network
dc.subjectinductive link prediction
dc.titleDeep learning-enhanced drug discovery: innovative molecule clustering and interaction prediction through graph analysis
dc.typemaster thesisen_US
local.subject.manitobano
project.funder.identifierhttps://doi.org/10.13039/501100000024
project.funder.nameCanadian Institutes of Health Research (CIHR)
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hadipour_Hamid.pdf
Size:
2.79 MB
Format:
Adobe Portable Document Format
Description:
The thesis contains confidential work, which is yet to be published in journals. Publications of manuscripts are in progress.
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
770 B
Format:
Item-specific license agreed to upon submission
Description: