Predicting the difference in thermodynamic stability between protein variants is crucial for protein design and understanding the genotype-phenotype relationships. So far, several computational tools have been created to address this task. Nevertheless, most of them have been trained or optimized on the same and 'all' available data, making a fair comparison unfeasible. Here, we introduce a novel dataset, collected and manually cleaned from the latest version of the ThermoMutDB database, consisting of 669 variants not included in the most widely used training datasets. The prediction performance and the ability to satisfy the antisymmetry property by considering both direct and reverse variants were evaluated across 21 different tools. The Pearson correlations of the tested tools were in the ranges of 0.21-0.5 and 0-0.45 for the direct and reverse variants, respectively. When both direct and reverse variants are considered, the antisymmetric methods perform better achieving a Pearson correlation in the range of 0.51-0.62. The tested methods seem relatively insensitive to the physiological conditions, performing well also on the variants measured with more extreme pH and temperature values. A common issue with all the tested methods is the compression of the $Delta Delta G$ predictions toward zero. Furthermore, the thermodynamic stability of the most significantly stabilizing variants was found to be more challenging to predict. This study is the most extensive comparison of prediction methods using an entirely novel set of variants never tested before.

Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset

Pancotti, Corrado;Benevenuta, Silvia;Birolo, Giovanni;Repetto, Valeria;Sanavia, Tiziana;Fariselli, Piero
2022-01-01

Abstract

Predicting the difference in thermodynamic stability between protein variants is crucial for protein design and understanding the genotype-phenotype relationships. So far, several computational tools have been created to address this task. Nevertheless, most of them have been trained or optimized on the same and 'all' available data, making a fair comparison unfeasible. Here, we introduce a novel dataset, collected and manually cleaned from the latest version of the ThermoMutDB database, consisting of 669 variants not included in the most widely used training datasets. The prediction performance and the ability to satisfy the antisymmetry property by considering both direct and reverse variants were evaluated across 21 different tools. The Pearson correlations of the tested tools were in the ranges of 0.21-0.5 and 0-0.45 for the direct and reverse variants, respectively. When both direct and reverse variants are considered, the antisymmetric methods perform better achieving a Pearson correlation in the range of 0.51-0.62. The tested methods seem relatively insensitive to the physiological conditions, performing well also on the variants measured with more extreme pH and temperature values. A common issue with all the tested methods is the compression of the $Delta Delta G$ predictions toward zero. Furthermore, the thermodynamic stability of the most significantly stabilizing variants was found to be more challenging to predict. This study is the most extensive comparison of prediction methods using an entirely novel set of variants never tested before.
2022
23
2
1
12
https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbab555/6502552
antisymmetry; machine learning; protein stability; single-point mutation; stability change
Pancotti, Corrado; Benevenuta, Silvia; Birolo, Giovanni; Alberini, Virginia; Repetto, Valeria; Sanavia, Tiziana; Capriotti, Emidio; Fariselli, Piero
File in questo prodotto:
File Dimensione Formato  
bbab555.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 1.33 MB
Formato Adobe PDF
1.33 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1832154
Citazioni
  • ???jsp.display-item.citation.pmc??? 27
  • Scopus 39
  • ???jsp.display-item.citation.isi??? 36
social impact