Background Amyotrophic lateral sclerosis (ALS) is known to represent a collection of overlapping syndromes. Various classification systems based on empirical observations have been proposed, but it is unclear to what extent they reflect ALS population substructures. We aimed to use machine-learning techniques to identify the number and nature of ALS subtypes to obtain a better understanding of this heterogeneity, enhance our understanding of the disease, and improve clinical care.Methods In this retrospective study, we applied unsupervised Uniform Manifold Approximation and Projection [UMAP]) modelling, semi-supervised (neural network UMAP) modelling, and supervised (ensemble learning based on LightGBM) modelling to a population-based discovery cohort of patients who were diagnosed with ALS while living in the Piedmont and Valle d'Aosta regions of Italy, for whom detailed clinical data, such as age at symptom onset, were available. We excluded patients with missing Revised ALS Functional Rating Scale (ALSFRS-R) feature values from the unsupervised and semi-supervised steps. We replicated our findings in an independent population-based cohort of patients who were diagnosed with ALS while living in the Emilia Romagna region of Italy.Findings Between Jan 1, 1995, and Dec 31, 2015, 2858 patients were entered in the discovery cohort. After excluding 497 (17%) patients with missing ALSFRS-R feature values, data for 42 clinical features across 2361 (83%) patients were available for the unsupervised and semi-supervised analysis. We found that semi-supervised machine learning produced the optimum clustering of the patients with ALS. These clusters roughly corresponded to the six clinical subtypes defined by the Chia classification system (ie, bulbar, respiratory, flail ann, classical, pyramidal, and flail leg ALS). Between jan 1, 2009, and March 1, 2018, 1097 patients were entered in the replication cohort. After excluding 108 (10%) patients with missing ALSFRS-R feature values, data for 42 clinical features across 989 patients were available for the unsupervised and semi-supervised analysis. All 1097 patients were included in the supervised analysis. The same clusters were identified in the replication cohort. By contrast, other ALS classification schemes, such as the El Escorial categories, Milano-Torino clinical staging, and King's clinical stages, did not adequately label the clusters. Supervised learning identified 11 clinical parameters that predicted ALS clinical subtypes with high accuracy (area under the curve 0.982 [95% CI 0.980-0.983]).Interpretation Our data-driven study provides insight into the ALS population substructure and confirms that the Chia classification system successfully identifies ALS subtypes. Additional validation is required to determine the accuracy and clinical use of these algorithms in assigning clinical subtypes. Nevertheless, our algorithms offer a broad insight into the clinical heterogeneity of ALS and help to determine the actual subtypes of disease that exist within this fatal neurodegenerative syndrome. The systematic identification of ALS subtypes will improve clinical care and clinical trial design. Copyright (C) 2022 The Author(s). Published by Elsevier Ltd.

Identifying and predicting amyotrophic lateral sclerosis clinical subgroups: a population-based machine-learning study

Rosario Vasta;Antonio Canosa;Cristina Moglia;Andrea Calvo;Adriano Chi??;Andrea Calvo;Cristina Moglia;Antonio Canosa;Umberto Manera;Rosario Vasta;Francesca Palumbo;Alessandro Bombaci;Maurizio Grassano;Maura Brunetti;Federico Casale;Giuseppe Fuda;Paolina Salamone;Barbara Iazzolino;Laura Peotta;Giovanni De Marco;Maria Claudia Torrieri;Salvatore Gallone;Marco Barberis;Luca Sbaiz;Alessandro Mauro;Antonio Bertolotto;Cristoforo Comi;Fabio Poglio;Lucia Testa;Eugenia Rota;Paolo Ghiglione;Pietro Cortelli;
2022-01-01

Abstract

Background Amyotrophic lateral sclerosis (ALS) is known to represent a collection of overlapping syndromes. Various classification systems based on empirical observations have been proposed, but it is unclear to what extent they reflect ALS population substructures. We aimed to use machine-learning techniques to identify the number and nature of ALS subtypes to obtain a better understanding of this heterogeneity, enhance our understanding of the disease, and improve clinical care.Methods In this retrospective study, we applied unsupervised Uniform Manifold Approximation and Projection [UMAP]) modelling, semi-supervised (neural network UMAP) modelling, and supervised (ensemble learning based on LightGBM) modelling to a population-based discovery cohort of patients who were diagnosed with ALS while living in the Piedmont and Valle d'Aosta regions of Italy, for whom detailed clinical data, such as age at symptom onset, were available. We excluded patients with missing Revised ALS Functional Rating Scale (ALSFRS-R) feature values from the unsupervised and semi-supervised steps. We replicated our findings in an independent population-based cohort of patients who were diagnosed with ALS while living in the Emilia Romagna region of Italy.Findings Between Jan 1, 1995, and Dec 31, 2015, 2858 patients were entered in the discovery cohort. After excluding 497 (17%) patients with missing ALSFRS-R feature values, data for 42 clinical features across 2361 (83%) patients were available for the unsupervised and semi-supervised analysis. We found that semi-supervised machine learning produced the optimum clustering of the patients with ALS. These clusters roughly corresponded to the six clinical subtypes defined by the Chia classification system (ie, bulbar, respiratory, flail ann, classical, pyramidal, and flail leg ALS). Between jan 1, 2009, and March 1, 2018, 1097 patients were entered in the replication cohort. After excluding 108 (10%) patients with missing ALSFRS-R feature values, data for 42 clinical features across 989 patients were available for the unsupervised and semi-supervised analysis. All 1097 patients were included in the supervised analysis. The same clusters were identified in the replication cohort. By contrast, other ALS classification schemes, such as the El Escorial categories, Milano-Torino clinical staging, and King's clinical stages, did not adequately label the clusters. Supervised learning identified 11 clinical parameters that predicted ALS clinical subtypes with high accuracy (area under the curve 0.982 [95% CI 0.980-0.983]).Interpretation Our data-driven study provides insight into the ALS population substructure and confirms that the Chia classification system successfully identifies ALS subtypes. Additional validation is required to determine the accuracy and clinical use of these algorithms in assigning clinical subtypes. Nevertheless, our algorithms offer a broad insight into the clinical heterogeneity of ALS and help to determine the actual subtypes of disease that exist within this fatal neurodegenerative syndrome. The systematic identification of ALS subtypes will improve clinical care and clinical trial design. Copyright (C) 2022 The Author(s). Published by Elsevier Ltd.
2022
4
5
E359
E369
Faraz Faghri; Fabian Brunn; Anant Dadu; Elisabetta Zucchi; Ilaria Martinelli; Letizia Mazzini; Rosario Vasta; Antonio Canosa; Cristina Moglia; Andrea Calvo; Michael A Nalls; Roy H Campbell; Jessica Mandrioli; Bryan J Traynor; Adriano Chi??; Adriano Chi??; Andrea Calvo; Cristina Moglia; Antonio Canosa; Umberto Manera; Rosario Vasta; Francesca Palumbo; Alessandro Bombaci; Maurizio Grassano; Maura Brunetti; Federico Casale; Giuseppe Fuda; Paolina Salamone; Barbara Iazzolino; Laura Peotta; Paolo Cugnasco; Giovanni De Marco; Maria Claudia Torrieri; Salvatore Gallone; Marco Barberis; Luca Sbaiz; Salvatore Gentile; Alessandro Mauro; Letizia Mazzini; Fabiola De Marchi; Lucia Corrado; Sandra D'Alfonso; Antonio Bertolotto; Daniele Imperiale; Marco De Mattei; Salvatore Amar??; Cristoforo Comi; Carmelo Labate; Fabio Poglio; Luigi Ruiz; Lucia Testa; Eugenia Rota; Paolo Ghiglione; Nicola Launaro; Alessia Di Sapio; Jessica Mandrioli; Nicola Fini; Ilaria Martinelli; Elisabetta Zucchi; Giulia Gianferrari; Cecilia Simonini; Stefano Meletti; Rocco Liguori; Veria Vacchiano; Fabrizio Salvi; Ilaria Bartolomei; Roberto Michelucci; Pietro Cortelli; Rita Rinaldi; Anna Maria Borghi; Andrea Zini; Elisabetta Sette; Valeria Tugnoli; Maura Pugliatti; Elena Canali; Luca Codeluppi; Franco Valzania; Lucia Zinno; Giovanni Pavesi; Doriana Medici; Giovanna Pilurzi; Emilio Terlizzi; Donata Guidetti; Silvia De Pasqua; Mario Santangelo; Patrizia De Massis; Martina Bracaglia; Mario Casmiro; Pietro Querzani; Simonetta Morresi; Marco Longoni; Alberto Patuelli; Susanna Malag??; Marco Curr?? Dossi; Simone Vidale; Salvatore Ferro
File in questo prodotto:
File Dimensione Formato  
Faghri et al, 2022, Lancet Digital health.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 1.34 MB
Formato Adobe PDF
1.34 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1880141
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 14
social impact