Multi-Modal Analysis and Federated Learning Approach for Classification and Personalized Prognostic Assessment in Myeloid Neoplasms

D'Amico, Saverio; Dall'Olio, Lorenzo; Rollo, Cesare; Alonso, Patricia; Prada-Luengo, Iñigo; Dall'Olio, Daniele; Sala, Claudia; Bersanelli, Matteo; Sauta, Elisabetta; Bicchieri, Marilena; Morandini, Pierandrea; Tommasini, Tobia; Savevski, Victor; Zhao, Lin-Pierre; Platzbecker, Uwe; Diez-Campelo, Maria; Santini, Valeria; Fenaux, Pierre; Haferlach, Torsten; Krogh, Anders; Zazo, Santiago; Fariselli, Piero; Sanavia, Tiziana; Della Porta, Matteo G.; Castellani, Gastone

doi:10.1182/blood-2022-166802

Background Myeloid neoplasms (MN) present clinical and molecular heterogeneity and therefore a risk-adapted treatment strategy is mandatory. In MN, classification and prognostic tools based on clinical and morphologic criteria are being complemented by introducing genomic features. The clinical implementation of next-generation classifications and prognostic systems requires the availability of a robust methodological framework together with a solution to provide access to these technologies for clinicians. Aims Machine learning (ML) and Deep Learning (DL) approaches produce powerful predictive models and offer explainable solutions to assure full interpretability of a model when applied in clinical settings. Here we provided a comprehensive assessment of explainable ML/DL-based methods for classification and prognostic assessment of MN and we developed a solution to apply these methods across different clinical Centres through a Federated Learning (FL) approach. Methods We analysed two cohorts of patients from GenoMed4All consortium with myelodysplastic syndrome (MDS), n=2,043 and n=2,384, with available clinical and molecular features to train and validate the models. Methods were then applied to other MN, i.e. acute myeloid leukemia (AML, n=1154) and chronic myelomonocytic leukemia (CMML, n=1037). We stratified patients by two clustering approaches based on Hierarchical Dirichlet Process (HDP) and HDBSCAN combined with UMAP data reduction. We trained a Random Forest (RF) classifier to assign new patients to the existing clusters, considering Balanced Accuracy (BA) and Cohen's K (CK) as performance metrics. We then compared different survival prediction methods: CoxPH model (and its penalized version), Random Survival Forests, DeepCox, Gradient Boosting and XGboost survival methods. Models’ explainability was performed through SHapley Additive exPlanations approach (SHAP). C-index was used to evaluate the models performance. Finally, we developed a Federated Learning (FL) environment together with an imputation approach to handle missing values by a deep decoder model. Results In MDS training cohort, we identified 18 and 8 clusters by using HDBSCAN and HDP, respectively (Figure 1). We measured the average Silhouette Coefficient on the data space obtaining the following performance in terms of classification task: HDBSCAN (BA:92.7±1.3%, CK:92.1±1.4%) and HDP (BA:85.8±0.8%, CK:83.3±0.9%). Similar distributions were observed when focusing on the validation cohort. Model explainability analysis (SHAP) showed that in both populations similar features drive patients’ classification. Comparison of survival prediction for MDS is displayed in Figure 2, showing the models’ performance in the two cohorts considering demographics, clinical, cytogenetics and genomic features. Non-linear ML/DL-based methods outperformed classical CoxPH-based approaches without requiring huge data pre-processing. Moreover, all the models showed higher C-indices with respect to that of conventional IPSS-R score. SHAP analysis showed similar feature importance ranking for both training and validation cohorts. Models were then applied to AML and CMML cohorts, showing consistent results across different type of MN. Finally, we aimed to develop a federated learning (FL) solution (FedAvg, with a deep decoder model for missing data imputation) to favour a wide clinical implementation of the models. Data were collected to a single server and used to build and train a centralized model. Using global data training was expected to improve the model efficiency. This approach also ensured that the data in each node adhere to data privacy policies. We implemented CoxPH model in a setting of 3 nodes (Centers) respectively contributing to 60%, 30% and 10% of the training data. We observed that the poor node (i.e., node contributing to 10% of data) benefit from FedAvg with respect to working on an isolated setting (C-index 0.63 vs. 0.54). The centralized model trained on the whole dataset presented the highest efficiency (C-index 0.74). Conclusion Machine Learning/Deep Learning approach produces explainable and robust solutions to optimize classification and prognostic assessment in MN, as a basis for personalized medicine programs in these disorders. Federate learning algorithms allow a wide clinical implementation of the models by ensuring high performance and data protection.