Machine learning-driven integration of multimodal data for deciphering breast cancer heterogeneity

dc.contributor.authorLiu, Qian
dc.contributor.examiningcommitteeWang, Yang (Computer Science)en_US
dc.contributor.examiningcommitteeMuthukumarana, Saman (Statistics)en_US
dc.contributor.examiningcommitteeMurphy, Leigh (Biochemistry and Medical Genetics)en_US
dc.contributor.examiningcommitteeGao, Xin (York University)en_US
dc.contributor.supervisorHu, Pingzhao
dc.date.accessioned2023-02-08T01:21:22Z
dc.date.available2023-02-08T01:21:22Z
dc.date.copyright2023-02-04
dc.date.issued2023-01-25
dc.date.submitted2023-01-26T09:25:16Zen_US
dc.date.submitted2023-02-04T18:01:22Zen_US
dc.degree.disciplineInterdisciplinary Programen_US
dc.degree.levelDoctor of Philosophy (Ph.D.)en_US
dc.description.abstractBreast cancer (BC) is a complex disease with a high degree of heterogeneity. The heterogeneity of BC could be detected at different biological levels using a variety of modern molecular biological techniques. These biotechniques could generate high-throughput and quantitative measurements, such as gene expression, copy number variation (CNV), DNA methylation, proteomics measurements, and so on. Meanwhile, the tumor morphology information obtained from medical images is also worthy of consideration in evaluating the heterogeneity of BC. Many machine-learning algorithms have been developed to help us to explore the heterogeneity of cancer from the abovementioned high-dimensional measurements. However, there are several challenges for characterizing BC heterogeneity based on the multi-modal biodata using the existing computational data analysis techniques. The first challenge is how to effectively combine the multi-modal biodata and find comprehensive and interpretable representations from them. Another challenge is how to address the execution infeasibility caused by the unpaired data problem (the publicly available datasets have unmatched multi-omics, medical images, and clinical outcome data). Besides, the model interpretability and privacy issues should also be carefully taken into consideration in machine learning-based BC research. This thesis aims to explore BC heterogeneity using thriving machine-learning algorithms at different data resolutions ranging from single genomics, multi-genomics, to proteomics and radiogenomics. We have four major objectives: 1)human epidermal growth factor receptor2 positive/estrogen receptor-positive (HER2+/ER+) BC stratification and prognostic gene signature identification using single genomic data; 2) BC subtyping using multiple genomics data; 3) Graph neural network (GNN) for BC hierarchical biological system mapping using graph-structured proteomics data; 4) BC prognostic radiogenomic biomarker identification. This thesis demonstrates the promising applications of machine learning in deciphering BC heterogeneity at different biological levels. Moreover, the resulting 15-gene HER2+/ER+ BC gene expression signature, multi-omics-based BC subtypes, hierarchical biological systems/protein communities, and prognostic radiogenomic biomarkers have the potential to benefit clinical practice for BC.en_US
dc.description.noteMay 2023en_US
dc.identifier.urihttp://hdl.handle.net/1993/37166
dc.language.isoengen_US
dc.rightsopen accessen_US
dc.subjectMachine learningen_US
dc.subjectBreast canceren_US
dc.subjectMedical imagingen_US
dc.subjectBiomarkeren_US
dc.subjectSubtypingen_US
dc.subjectMulti-omicsen_US
dc.titleMachine learning-driven integration of multimodal data for deciphering breast cancer heterogeneityen_US
dc.typedoctoral thesisen_US
local.subject.manitobanoen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Liu_Qian.pdf
Size:
16.98 MB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.2 KB
Format:
Item-specific license agreed to upon submission
Description: