Processed pseudogenes are DNA sequences generated through reverse transcription (RT) and retrotransposition of mature mRNAs. These sequences are usually considered junk DNA, since in most cases they lack a suitable promoter and are no longer transcribed. Nonetheless, due to their origin, they represent a valuable source of information on the transcriptome, which becomes particularly interesting for organisms lacking large EST collections. Here, we describe REtrotransposed Gene EXPlorer (REGEXP), a new method for the systematic identification of retrotransposition events that, unlike existing approaches, does not rely on a priori knowledge of mRNA sequences. Using our pipeline, we were able to identify 2288 processed pseudogenes in the human genome, showing a good overlap with the ENSEMBL, VEGA, and pseudogene.org datasets. These pseudogenes could be traced back to 987 genes, mostly corresponding to already known genes. In many cases, we recovered the signature of additional exons, likely due to alternative splicing. Interestingly, some of our predictions did not match previously known or predicted genes, and we were able to validate most of them by RT-polymerase chain reaction (PCR). Similar results were obtained with the mouse genome. Our data show that the REGEXP method is capable of identifying processed pseudogenes and to predict most of the corresponding genes with high specificity. Therefore, it may represent a valuable integration to the current genome annotation pipelines.

A new approach for the identification of processed pseudogenes

MOLINERIS, Ivan;BIANCHI, Federico Tommaso;DI CUNTO, Ferdinando;CASELLE, Michele
2010-01-01

Abstract

Processed pseudogenes are DNA sequences generated through reverse transcription (RT) and retrotransposition of mature mRNAs. These sequences are usually considered junk DNA, since in most cases they lack a suitable promoter and are no longer transcribed. Nonetheless, due to their origin, they represent a valuable source of information on the transcriptome, which becomes particularly interesting for organisms lacking large EST collections. Here, we describe REtrotransposed Gene EXPlorer (REGEXP), a new method for the systematic identification of retrotransposition events that, unlike existing approaches, does not rely on a priori knowledge of mRNA sequences. Using our pipeline, we were able to identify 2288 processed pseudogenes in the human genome, showing a good overlap with the ENSEMBL, VEGA, and pseudogene.org datasets. These pseudogenes could be traced back to 987 genes, mostly corresponding to already known genes. In many cases, we recovered the signature of additional exons, likely due to alternative splicing. Interestingly, some of our predictions did not match previously known or predicted genes, and we were able to validate most of them by RT-polymerase chain reaction (PCR). Similar results were obtained with the mouse genome. Our data show that the REGEXP method is capable of identifying processed pseudogenes and to predict most of the corresponding genes with high specificity. Therefore, it may represent a valuable integration to the current genome annotation pipelines.
2010
17(5)
755
765
http://www.liebertonline.com/doi/abs/10.1089/cmb.2009.0027
Molineris I; Sales G; Bianchi F; Di Cunto F; Caselle M
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/82859
Citazioni
  • ???jsp.display-item.citation.pmc??? 4
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 10
social impact