Correlated mutations in proteins are believed to occur in order to preserve the protein functional folding through evolution. Their values can be deduced from sequence and/or structural alignments and are indicative of residue contacts in the protein three-dimensional structure. A correlation among pairs of residues is routinely evaluated with the Pearson correlation coefficient and the MCLACHLAN similarity matrix. In this paper, we describe an optimization procedure that maximizes the correlation between the Pearson coefficient and the protein residue contacts with respect to different similarity matrices, including random. Our results indicate that there is a large number of equivalent matrices that perform similarly to MCLACHLAN. We also obtain that the upper limit to the accuracy achievable in the prediction of the protein residue contacts is independent of the optimized similarity matrix. This suggests that poor scoring may be due to the choice of the linear correlation function in evaluating correlated mutations.

On the upper bound of the prediction accuracy of residue contacts in proteins with correlated mutations: the case study of the similarity matrices

P. Fariselli;
2009-01-01

Abstract

Correlated mutations in proteins are believed to occur in order to preserve the protein functional folding through evolution. Their values can be deduced from sequence and/or structural alignments and are indicative of residue contacts in the protein three-dimensional structure. A correlation among pairs of residues is routinely evaluated with the Pearson correlation coefficient and the MCLACHLAN similarity matrix. In this paper, we describe an optimization procedure that maximizes the correlation between the Pearson coefficient and the protein residue contacts with respect to different similarity matrices, including random. Our results indicate that there is a large number of equivalent matrices that perform similarly to MCLACHLAN. We also obtain that the upper limit to the accuracy achievable in the prediction of the protein residue contacts is independent of the optimized similarity matrix. This suggests that poor scoring may be due to the choice of the linear correlation function in evaluating correlated mutations.
2009
Algorithms in Bioinformatics
Springer
62
72
978-3-642-04241-6
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1687508
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact