We describe a database of pig proteins, comprising a total of 35,381 sequences collected by merging 19,576 and 23,118 chains retrieved respectively from UniProtKB, one of the major resources of protein sequences freely available, and from Ensembl, the genome database for eukaryotic species. Some 90% of these chains are poorly annotated and their existence is inferred automatically by sequence alignment towards the entire protein universe database. Given the relevance of the pig proteome in different studies, including human complex maladies, a statistical validation of the annotation is required for a better understanding of the role of specific genes and proteins in the complex networks underlying biological processes in the animal. We introduce BAR-PIG, a database in which some 21,793 sequences are endowed with a statistically validated annotation. Statistical validation is determined by adopting a cluster-centric annotation procedure that allows different types of annotation from structure to function and when possible to both structure and function. Each sequence in the database can be associated with a set of statistically validated Gene Ontologies (GO) of the three main routes (Molecular Function, Biological Process, Cellular Component), with Pfam functional domains and when possible with a cluster HMM model that allows building of the three dimensional structure of the protein. A database search allows some statistics demonstrating the enrichment in both GO and Pfam terms of the pig proteins as compared to the UniProtKB annotation. Conclusion: Protein sequence annotation after cluster statistical validation is at the basis of the database that we present in this paper. Searching in the BAR-PIG database allows retrieval of the pig protein annotation for further analysis. The search is also possible on the basis of specific GO terms and this allows retrieval of all the pig sequences participating into a given biological process, after annotation with our system. Alternatively the search is possible on the basis of structural information, allowing retrieval of all the pig sequences with the same structural characteristics

BAR-PIG: a database of the pig proteome with structural and functional statistically validated annotation

P. Fariselli;
2012-01-01

Abstract

We describe a database of pig proteins, comprising a total of 35,381 sequences collected by merging 19,576 and 23,118 chains retrieved respectively from UniProtKB, one of the major resources of protein sequences freely available, and from Ensembl, the genome database for eukaryotic species. Some 90% of these chains are poorly annotated and their existence is inferred automatically by sequence alignment towards the entire protein universe database. Given the relevance of the pig proteome in different studies, including human complex maladies, a statistical validation of the annotation is required for a better understanding of the role of specific genes and proteins in the complex networks underlying biological processes in the animal. We introduce BAR-PIG, a database in which some 21,793 sequences are endowed with a statistically validated annotation. Statistical validation is determined by adopting a cluster-centric annotation procedure that allows different types of annotation from structure to function and when possible to both structure and function. Each sequence in the database can be associated with a set of statistically validated Gene Ontologies (GO) of the three main routes (Molecular Function, Biological Process, Cellular Component), with Pfam functional domains and when possible with a cluster HMM model that allows building of the three dimensional structure of the protein. A database search allows some statistics demonstrating the enrichment in both GO and Pfam terms of the pig proteins as compared to the UniProtKB annotation. Conclusion: Protein sequence annotation after cluster statistical validation is at the basis of the database that we present in this paper. Searching in the BAR-PIG database allows retrieval of the pig protein annotation for further analysis. The search is also possible on the basis of specific GO terms and this allows retrieval of all the pig sequences participating into a given biological process, after annotation with our system. Alternatively the search is possible on the basis of structural information, allowing retrieval of all the pig sequences with the same structural characteristics
2012
56th national meeting of the Italian Society of Biochemistry and Molecular Biology
Chieti
26-29/9/2012
208
208
http://www.biochimica.it/congressiecorsi.html
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1687503
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact