An effective knowledge extraction and quantification methodology from biomedical literature would allow the researcher to organize and analyze the results of high throughput experiments on microarrays and next generation sequencing technologies. Despite the large amount of raw information available on the Web, a tool able to extract a measure of the correlation between a list of genes and biological processes is not yet available. In this paper we present Gelsius, a workflow that incorporates biomedical literature to quantify the correlation between genes and terms describing biological processes. To achieve this target, we build different modules focusing on query expansion and document cononicalization. In this way we reached to improve the measurement of correlation, performed using a latent semantic analysis approach. To the best of our knowledge, this is the first complete tool able to extract a measure of genes-biological processes correlation from literature. We demonstrate the effectiveness of the proposed workflow on six biological processes and a set of genes, by showing that correlation results for known relationships are in accordance with definitions of gene functions provided by NCI Thesaurus. On the other side, the tool is able to propose new candidate relationships for later experimental validation. The tool is available at the following web site: http://bioeda1.polito.it:8080/medSearchServlet/

Gelsius: A Literature-Based Workflow for Determining Quantitative Associations Between Genes and Biological Processes.

PIVA, Roberto;
2013-01-01

Abstract

An effective knowledge extraction and quantification methodology from biomedical literature would allow the researcher to organize and analyze the results of high throughput experiments on microarrays and next generation sequencing technologies. Despite the large amount of raw information available on the Web, a tool able to extract a measure of the correlation between a list of genes and biological processes is not yet available. In this paper we present Gelsius, a workflow that incorporates biomedical literature to quantify the correlation between genes and terms describing biological processes. To achieve this target, we build different modules focusing on query expansion and document cononicalization. In this way we reached to improve the measurement of correlation, performed using a latent semantic analysis approach. To the best of our knowledge, this is the first complete tool able to extract a measure of genes-biological processes correlation from literature. We demonstrate the effectiveness of the proposed workflow on six biological processes and a set of genes, by showing that correlation results for known relationships are in accordance with definitions of gene functions provided by NCI Thesaurus. On the other side, the tool is able to propose new candidate relationships for later experimental validation. The tool is available at the following web site: http://bioeda1.polito.it:8080/medSearchServlet/
2013
1
13
Abate F;Acquaviva A;Ficarra E;Piva R;Macii E
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/136221
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact