On the upper bound of the prediction accuracy of residue contacts in proteins with correlated mutations: the case study of the similarity matrices

Di Lena, P.; Fariselli, P.; Margara, L.; Vassura, M.; Casadio, R.

doi:10.1007/978-3-642-04241-6_6

Correlated mutations in proteins are believed to occur in order to preserve the protein functional folding through evolution. Their values can be deduced from sequence and/or structural alignments and are indicative of residue contacts in the protein three-dimensional structure. A correlation among pairs of residues is routinely evaluated with the Pearson correlation coefficient and the MCLACHLAN similarity matrix. In this paper, we describe an optimization procedure that maximizes the correlation between the Pearson coefficient and the protein residue contacts with respect to different similarity matrices, including random. Our results indicate that there is a large number of equivalent matrices that perform similarly to MCLACHLAN. We also obtain that the upper limit to the accuracy achievable in the prediction of the protein residue contacts is independent of the optimized similarity matrix. This suggests that poor scoring may be due to the choice of the linear correlation function in evaluating correlated mutations.