The integration of transcriptional data with other layers of information, such as the post-transcriptional regulation mediated by microRNAs, can be crucial to identify the driver genes and the subtypes of complex and heterogeneous diseases such as cancer. This paper presents an approach based on topic modeling to accomplish this integration task. More specifically, we show how an algorithm based on a hierarchical version of stochastic block modeling can be naturally extended to integrate any combination of ’omics data. We test this approach on breast cancer samples from the TCGA database, integrating data on messenger RNA, microRNAs, and copy number variations. We show that the inclusion of the microRNA layer significantly improves the accuracy of subtype classification. Moreover, some of the hidden structures or “topics” that the algorithm extracts actually correspond to genes and microRNAs involved in breast cancer development and are associated to the survival probability.

Multiomics Topic Modeling for Breast Cancer Classification

Valle F.
First
;
Osella M.;Caselle M.
2022-01-01

Abstract

The integration of transcriptional data with other layers of information, such as the post-transcriptional regulation mediated by microRNAs, can be crucial to identify the driver genes and the subtypes of complex and heterogeneous diseases such as cancer. This paper presents an approach based on topic modeling to accomplish this integration task. More specifically, we show how an algorithm based on a hierarchical version of stochastic block modeling can be naturally extended to integrate any combination of ’omics data. We test this approach on breast cancer samples from the TCGA database, integrating data on messenger RNA, microRNAs, and copy number variations. We show that the inclusion of the microRNA layer significantly improves the accuracy of subtype classification. Moreover, some of the hidden structures or “topics” that the algorithm extracts actually correspond to genes and microRNAs involved in breast cancer development and are associated to the survival probability.
2022
14
5
1150
1150
Chr14q32; MiRNA expression regulation; MiRNAs; Multiomics; Stochastic block modeling; Topic modeling
Valle F.; Osella M.; Caselle M.
File in questo prodotto:
File Dimensione Formato  
cancers-14-01150.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 875.78 kB
Formato Adobe PDF
875.78 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1850916
Citazioni
  • ???jsp.display-item.citation.pmc??? 4
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 4
social impact