CINECA IRIS Institutional Research Information System

Stratification of breast cancer (BC) into molecular subtypes by multigene expression assays is of demonstrated clinical utility. In principle, global RNA-sequencing (RNA-seq) should enable reconstructing existing transcriptional classifications of BC samples. Yet, it is not clear whether adaptation to RNA-seq of classifiers originally developed using PCR or microarrays, or reconstruction through machine learning (ML) is preferable. Hence, we focused on robustness and portability of PAM50, a nearest-centroid classifier developed on microarray data to identify five BC "intrinsic subtypes". We found that standard PAM50 is profoundly affected by the composition of the sample cohort used for reference construction, and we propose a strategy, named AWCA, to mitigate this issue, improving classification robustness, with over 90% of concordance, and prognostic ability; we also show that AWCA-based PAM50 can even be applied as single-sample method. Furthermore, we explored five supervised learners to build robust, single-sample intrinsic subtype callers via RNA-seq. From our ML-based survey, regularized multiclass logistic regression (mLR) displayed the best performance, further increased by ad-hoc gene selection on the global transcriptome. On external test sets, mLR classifications reached 90% concordance with PAM50-based calls, without need of reference sample; mLR proven robustness and prognostic ability make it an equally valuable single-sample method to strengthen BC subtyping.

Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer

Cascianelli S.;Molineris I.;Isella C.;Masseroli M.;Medico E.

2020-01-01

Abstract

Stratification of breast cancer (BC) into molecular subtypes by multigene expression assays is of demonstrated clinical utility. In principle, global RNA-sequencing (RNA-seq) should enable reconstructing existing transcriptional classifications of BC samples. Yet, it is not clear whether adaptation to RNA-seq of classifiers originally developed using PCR or microarrays, or reconstruction through machine learning (ML) is preferable. Hence, we focused on robustness and portability of PAM50, a nearest-centroid classifier developed on microarray data to identify five BC "intrinsic subtypes". We found that standard PAM50 is profoundly affected by the composition of the sample cohort used for reference construction, and we propose a strategy, named AWCA, to mitigate this issue, improving classification robustness, with over 90% of concordance, and prognostic ability; we also show that AWCA-based PAM50 can even be applied as single-sample method. Furthermore, we explored five supervised learners to build robust, single-sample intrinsic subtype callers via RNA-seq. From our ML-based survey, regularized multiclass logistic regression (mLR) displayed the best performance, further increased by ad-hoc gene selection on the global transcriptome. On external test sets, mLR classifications reached 90% concordance with PAM50-based calls, without need of reference sample; mLR proven robustness and prognostic ability make it an equally valuable single-sample method to strengthen BC subtyping.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Lingua di pubblicazione
	
				Inglese
			
	Codice ISI WoS
	
				WOS:000567105200007
			
	Codice PubMed
	
				32826944
			
	Codice Scopus
	
				2-s2.0-85089697001
			
	Referee
	
				Esperti anonimi
			
	Titolo rivista
	
				SCIENTIFIC REPORTS
			
	N. Volume
	
				10
			
	Fascicolo
	
				1
			
	Pagine (da)
	
				14071
			
	Pagine (a)
	
				14075
			
	Numero di pagine totale
	
				5
			
	DOI
	
				https://dx.doi.org/10.1038/s41598-020-70832-2
			
	Parole Chiave
	
				Biomarkers, Tumor; Breast Neoplasms; Carcinoma; Datasets as Topic; Estrogens; Female; Humans; Logistic Models; Neoplasms, Hormone-Dependent; Prognosis; Receptors, Estrogen; Recurrence; Machine Learning; Sequence Analysis, RNA
			
	Coautori affiliati a enti stranieri
	
				no
			
	Prodotto conforme al Regolamento di Ateneo sull'accesso aperto?
	
				1 – prodotto con  file in versione Open Access (allegherò il file al passo 6 - Carica)
			
	Tipologia sito docente
	
				262
			
	Numero autori
	
				5
			
	Tutti gli autori
	
						Cascianelli S.; Molineris I.; Isella C.; Masseroli M.; Medico E.
					
	Tipologia
	
				info:eu-repo/semantics/article
			
	Fulltext
	
				open
			
	Tipologia
	
				03-CONTRIBUTO IN RIVISTA::03A-Articolo su Rivista
			
	Appare nelle tipologie:
	
				03A-Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
s41598-020-70832-2.pdf Accesso aperto Tipo di file: PDF EDITORIALE Dimensione 1.8 MB Formato Adobe PDF Visualizza/Apri	1.8 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1934570

Citazioni

19

34

29

social impact