Molecular representation modeling with graph neural networks for antibiotic discovery
Motivation: With the advent of large-scale compound screening facilitated by the high-throughput technologies, a variety of machine learning methods have been integrated into the pipelines of antibiotic discovery. Feature engineering, a type of data mining technique, is often used as one of the first steps to mine patterns from big data and optimize predictive models for the goal. Clustering analysis, on the other hand, is a critical approach to get insights into the underlying biological relationships between the gene products in the high-dimensional chemical-genetic data. Converting molecules into computer-interpretable features with rich molecular information is a core problem of data-driven machine learning applications in chemical and drug-related tasks. As small molecules can be considered as non-structural data, graph convolutional neural networks (GCNs), which can learn and aggregate the local information of molecules, have been used to predict molecular properties with great success. Furthermore, given the merits and various successful practices of transformers in multiple artificial intelligence (AI) domains, it is desirable to integrate the self-attention mechanism into GCNs for better molecular representation construction. Methods and Results: This thesis begins with a review of different types of molecular representations and deep learning architectures to which they are applicable in Chapter 1 and the research objectives were described in Chapter 2. Next, Chapter 3 first applied statistical and machine learning approaches to evaluate the important features for predicting bacterial growth inhibitory activity , then applied a directed-message passing neural network to analyze a large-scale compound screen against Burkholderia cenocepacia to predict the bacterial growth inhibition of drugs. In Chapter 4, a directed-message passing neural network-based analytic framework was developed to model the large-scale chemical-genetic interaction profiles against Mycobacterium tuberculosis to predict drug mechanism of action. Finally, in Chapter 5, we proposed an atom and bond attention-based message passing neural network, namely ABT-MPNN, as an attempt to improve the molecular representation embedding process for antibiotic discovery. Conclusion: This thesis provides analytical frameworks for both large-scale compound screening datasets and chemical-genetic interaction profiles, and generates hypotheses about the mechanism of action of novel drugs based on the predicted results. More importantly, by leveraging message passing neural networks multiple times, as well as designing a novel attention-based message passing neural network, this thesis also highlights the great importance of graph-based deep neural networks in drug discovery.
Graph neural networks, Molecular representation, Bacterial growth inhibitory activity, Mechanism of action