Deconvolution of bulk gene expression profiles to characterize the tumour immune landscape of early onset breast cancer
Jin, Yong Won
MetadataShow full item record
Introduction: Young age at diagnosis (age < 40) is considered to be an independent factor for poor clinical outcomes in breast cancer patients. Patterns of tumour infiltrating lymphocytes may provide insight into the underlying biology behind this disparity, which is yet to be discovered. Deconvolution algorithms can be used to characterize tumour infiltrating lymphocytes given the gene expression profile of the bulk tumour tissue sample. In this work, models of deconvolution, both novel and existing, were used to extract distinct patterns of tumour-infiltrating immune cells in early onset breast cancer that are significantly associated with clinical outcomes and other molecular signatures. Methods: The tumour immune landscape was characterized by computational deconvolution of bulk tissue transcriptomes using an existing tool (TIMER) as well as developing a novel tool based on deep learning – neural network immune contexture estimator (NNICE). Pseudo-bulk gene expression profiles were simulated by leveraging single cell RNA-sequencing data from immune and breast cancer cells with known cell type compositions. Pseudo-bulk profiles were used to optimize deep learning model with deep quartile regression to provide estimates for cell fractions given bulk transcriptomes. Then, the characterized tumour immune landscape was associated with clinical outcomes in early onset breast cancer using large-scale breast cancer datasets. Results: Immune cell abundance estimates from TIMER revealed that clinical outcomes of early onset breast cancer patients were more significantly affected by the abundance of CD8+ cytotoxic T cells, but to a lesser extent in the old patients. NNICE model of deconvolution produced more accurate predictions of cell type composition from bulk transcriptomes on training dataset. However, performance of NNICE model was not robust across different datasets, and the estimates on breast cancer datasets showed inconsistent results across cohorts. Conclusion: The canonical survival disparity of early onset breast cancer patients was observed but the TIL landscape identified by current deconvolution algorithms do not give consistent results. Novel models of deconvolution built on state-of-the-art deep learning frameworks leveraging scRNA-seq data have potential to produce accurate estimates of cell type proportions; however, further research is needed to optimize these algorithms to be robust to differences between transcriptomic datasets.