Motivation: The widespread coiled-coil structural motif in proteins is known to mediate a variety of biological interactions. Recognizing a coiled-coil containing sequence and locating its coiled-coil domains are key steps towards the determination of the protein structure and function. Different tools are available for predicting coiled-coil domains in protein sequences, including those based on position-specific score matrices and machine learning methods. Results: In this article, we introduce a hidden Markov model (CCHMM_PROF) that exploits the information contained in multiple sequence alignments (profiles) to predict coiled-coil regions. The new method discriminates coiled-coil sequences with an accuracy of 97% and achieves a true positive rate of 79% with only 1% of false positives. Furthermore, when predicting the location of coiled-coil segments in protein sequences, the method reaches an accuracy of 80% at the residue level and a best per-segment and per-protein efficiency of 81% and 80%, respectively. The results indicate that CCHMM_PROF outperforms all the existing tools and can be adopted for large-scale genome annotation. Availability: The dataset is available at http://www.biocomp.unibo .it/∼lisa/coiled-coils. The predictor is freely available at http://gpcr .biocomp.unibo.it/cgi/predictors/cchmmprof/pred_cchmmprof.cgi.
CCHMM_PROF: a HMM-based coiled-coil predictor with evolutionary information
Fariselli P.;
2009-01-01
Abstract
Motivation: The widespread coiled-coil structural motif in proteins is known to mediate a variety of biological interactions. Recognizing a coiled-coil containing sequence and locating its coiled-coil domains are key steps towards the determination of the protein structure and function. Different tools are available for predicting coiled-coil domains in protein sequences, including those based on position-specific score matrices and machine learning methods. Results: In this article, we introduce a hidden Markov model (CCHMM_PROF) that exploits the information contained in multiple sequence alignments (profiles) to predict coiled-coil regions. The new method discriminates coiled-coil sequences with an accuracy of 97% and achieves a true positive rate of 79% with only 1% of false positives. Furthermore, when predicting the location of coiled-coil segments in protein sequences, the method reaches an accuracy of 80% at the residue level and a best per-segment and per-protein efficiency of 81% and 80%, respectively. The results indicate that CCHMM_PROF outperforms all the existing tools and can be adopted for large-scale genome annotation. Availability: The dataset is available at http://www.biocomp.unibo .it/∼lisa/coiled-coils. The predictor is freely available at http://gpcr .biocomp.unibo.it/cgi/predictors/cchmmprof/pred_cchmmprof.cgi.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.