On the biases in predictions of protein stability changes upon variations: the INPS test case

Montanucci, Ludovica; Savojardo, Castrense; Martelli, Pier Luigi; Casadio, Rita; Fariselli, Piero

doi:10.1093/bioinformatics/bty979

Two recent articles (Pucci et al., 2018; Usmanova et al., 2018) pro- posed two ad hoc datasets and evaluation procedures to quantify an important bias in the predictors of protein stability changes (DDG) upon single-point variations. Current datasets are dominated by destabilizing variations. Thus predictors (trained on those datasets) tend to perform better on destabilizing rather than on stabilizing variations. However, as highlighted before (Capriotti et al., 2008; Christensen and Kepp 2012; Pucci et al., 2015, 2018; Thiltgen and Goldstein 2012; Usmanova et al., 2018), there must be an intrinsic anti-symmetric property in the Physics of the DDG. Two proteins A and B differing by one residue in position P (residue a in A and resi- due b in B) are each one a variant of the other, such as the variation aPb in A corresponds to protein B, and vice versa (Protein A 1⁄4 Protein B with variation bPa). Thus, the following relation must hold: DDG AB = DDG BA (1) where DDG XY is the free energy change upon single-point variation of protein X that gives rise to protein Y. From this, a predictor of free energy changes upon variations has to fulfil the property: DDG AB þ DDG BA ffi 0. The authors of the articles mentioned above (Pucci et al., 2018; Usmanova et al., 2018) exploited Eq. 1, to meas- ure the biases of different predictors, and showed that most of them strongly suffer from this lack of anti-symmetry. However, in 2015 we developed a sequence-based predictor INPS that was designed to be at least partially anti-symmetric (Fariselli et al., 2015). INPS was not included in those studies (Pucci et al., 2018; Usmanova et al., 2018), and in this letter we add an evaluation of the INPS bias for comparison. We show that INPS is very balanced and robust to this kind of bias.