Deep learning-enhanced drug discovery: innovative molecule clustering and interaction prediction through graph analysis
dc.contributor.author | Hadipour, Hamid | |
dc.contributor.examiningcommittee | Akcora, Cuneyt (Computer Science) | |
dc.contributor.examiningcommittee | Cardona, Silvia (Microbiology) | |
dc.contributor.supervisor | Hu, Pingzhao | |
dc.contributor.supervisor | Leung, Carson | |
dc.date.accessioned | 2024-01-05T16:01:17Z | |
dc.date.available | 2024-01-05T16:01:17Z | |
dc.date.issued | 2023-12-21 | |
dc.date.submitted | 2023-12-22T04:05:47Z | en_US |
dc.degree.discipline | Computer Science | en_US |
dc.degree.level | Master of Science (M.Sc.) | |
dc.description.abstract | Motivation The quest for efficient drug discovery processes necessitates a comprehensive approach that integrates molecular feature analysis with accurate compound-protein interaction (CPI) prediction. This study introduces models that combine deep learning (DL) techniques for intricate molecular feature engineering and innovative CPI prediction methods. This integration responds to the need for detailed molecular dataset analysis and the prediction of interactions between novel compounds and proteins, thereby enhancing drug discovery. Methods and Results Chapter 3 - Molecular Clustering and Feature Analysis: The framework implements a feature engineering scheme focusing on molecule-specific atomic and bonding information. It utilizes principal component analysis (PCA) for encoding this information and a variational autoencoder (VAE)-based method for embedding both global chemical properties and local features. This approach facilitated the clustering of a large dataset containing over 47,000 molecules. Using the K-means method with 32 embedding`s size based on the VAE method, 50 distinct molecular clusters were identified. These clusters were visualized through t-distributed Stochastic Neighbor Embedding (t-SNE), showcasing the framework's capability in effectively grouping molecules based on their complex features. Chapter 4 - CPI Prediction with GraphBAN: For CPI prediction, the study introduces GraphBAN, a novel inductive-based approach using graph knowledge distillation (KD). This component incorporates a deep bilinear attention network (BAN) and a KD module for graph analysis, enabling the alignment of interaction features across different distributions. GraphBAN's functionality extends to both transductive and inductive link predictions in a bi-partite graph of CPIs. Tested against three benchmark datasets, GraphBAN demonstrated superior performance, outperforming six baseline models. It shows that it is able to predict interactions between unseen compounds and proteins that is an important aspect of drug discovery. Conclusion This study presents two innovative models that parallelly analyze molecule-specific feature engineering and advanced CPI prediction techniques. By integrating these two key components, the models not only deepen the understanding of molecular characteristics but also significantly boost the accuracy of CPI predictions. This advancement is crucial for streamlining drug discovery processes, reducing the number of compounds needed for screening, and facilitating the development of more effective and targeted drugs. | |
dc.description.note | February 2024 | |
dc.identifier.uri | http://hdl.handle.net/1993/37931 | |
dc.language.iso | eng | |
dc.rights | open access | en_US |
dc.subject | deep learning | |
dc.subject | drug discovery | |
dc.subject | graph neural network | |
dc.subject | inductive link prediction | |
dc.title | Deep learning-enhanced drug discovery: innovative molecule clustering and interaction prediction through graph analysis | |
dc.type | master thesis | en_US |
local.subject.manitoba | no | |
project.funder.identifier | https://doi.org/10.13039/501100000024 | |
project.funder.name | Canadian Institutes of Health Research (CIHR) |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Hadipour_Hamid.pdf
- Size:
- 2.79 MB
- Format:
- Adobe Portable Document Format
- Description:
- The thesis contains confidential work, which is yet to be published in journals. Publications of manuscripts are in progress.
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 770 B
- Format:
- Item-specific license agreed to upon submission
- Description: