The paper focuses on the development of a software tool for protein clustering according to their amino acid content. All known human proteins were clustered according to the relative frequencies of their amino acids starting from the UniProtKB/Swiss-Prot reference database and making use of hierarchical cluster analysis. Results were compared to those based on sequence similarities. Results: Proteins display different clustering patterns according to type. Many extracellular proteins with highly specific and repetitive sequences (keratins, collagens etc.) cluster clearly confirming the accuracy of the clustering method. In our case clustering by sequence and amino acid content overlaps. Proteins with a more complex structure with multiple domains (catalytic, extracellular, transmembrane etc.), even if classified very similar according to sequence similarity and function (aquaporins, cadherins, steroid 5-alpha reductase etc.) showed different clustering according to amino acid content. Availability of essential amino acids according to local conditions (starvation, low or high oxygen, cell cycle phase etc.) may be a limiting factor in protein synthesis, whatever the mRNA level. This type of protein clustering may therefore prove a valuable tool in identifying so far unknown metabolic connections and constraints.

Human Protein Cluster Analysis Using Amino Acid Frequencies

VERNONE, Annamaria;BERCHIALLA, Paola;PESCARMONA, Gianpiero
2013-01-01

Abstract

The paper focuses on the development of a software tool for protein clustering according to their amino acid content. All known human proteins were clustered according to the relative frequencies of their amino acids starting from the UniProtKB/Swiss-Prot reference database and making use of hierarchical cluster analysis. Results were compared to those based on sequence similarities. Results: Proteins display different clustering patterns according to type. Many extracellular proteins with highly specific and repetitive sequences (keratins, collagens etc.) cluster clearly confirming the accuracy of the clustering method. In our case clustering by sequence and amino acid content overlaps. Proteins with a more complex structure with multiple domains (catalytic, extracellular, transmembrane etc.), even if classified very similar according to sequence similarity and function (aquaporins, cadherins, steroid 5-alpha reductase etc.) showed different clustering according to amino acid content. Availability of essential amino acids according to local conditions (starvation, low or high oxygen, cell cycle phase etc.) may be a limiting factor in protein synthesis, whatever the mRNA level. This type of protein clustering may therefore prove a valuable tool in identifying so far unknown metabolic connections and constraints.
2013
8
1
5
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3617222/
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0060220
Annamaria Vernone; Paola Berchialla; Gianpiero Pescarmona
File in questo prodotto:
File Dimensione Formato  
journal.pone.0060220.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 1 MB
Formato Adobe PDF
1 MB Adobe PDF Visualizza/Apri
human_protein_FASTCLUSTER_file_excel_orizzontale_11062012.xls

Accesso aperto

Descrizione: Tabella dati sperimentali
Tipo di file: DATASET
Dimensione 7.86 MB
Formato Microsoft Excel
7.86 MB Microsoft Excel Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/129285
Citazioni
  • ???jsp.display-item.citation.pmc??? 5
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 4
social impact