Background. The ability to accurately classify cancer patients into risk classes, i.e. to predict the outcome of the pathology on an individual basis, is a key ingredient in making therapeutic decisions. In recent years gene expression data have been successfully used to complement the clinical and histological criteria traditionally used in such prediction. Many "gene expression signatures" have been developed, i.e. sets of genes whose expression values in a tumor can be used to predict the outcome of the pathology. Here we investigate the use of several machine learning techniques to classify breast cancer patients using one of such signatures, the well established 70-gene signature. Results. We show that Genetic Programming performs significantly better than Support Vector Machines, Multilayered Perceptrons and Random Forests in classifying patients from the NKI breast cancer dataset, and slightly better than the scoring-based method originally proposed by the authors of the 70-gene signature. Furthermore, Genetic Programming is able to perform an automatic feature selection. Conclusions. Since the performance of Genetic Programming is likely to be improvable compared to the out-of-the-box approach used here, and given the biological insight potentially provided by the Genetic Programming solutions, we conclude that Genetic Programming methods are worth further investigation as a tool for cancer patient classification based on gene expression data.

A Comparison of Machine Learning Techniques for Survival Prediction in Breast Cancer

PROVERO, Paolo
Co-last
;
GIACOBINI, Mario Dante Lucio
Co-last
2011-01-01

Abstract

Background. The ability to accurately classify cancer patients into risk classes, i.e. to predict the outcome of the pathology on an individual basis, is a key ingredient in making therapeutic decisions. In recent years gene expression data have been successfully used to complement the clinical and histological criteria traditionally used in such prediction. Many "gene expression signatures" have been developed, i.e. sets of genes whose expression values in a tumor can be used to predict the outcome of the pathology. Here we investigate the use of several machine learning techniques to classify breast cancer patients using one of such signatures, the well established 70-gene signature. Results. We show that Genetic Programming performs significantly better than Support Vector Machines, Multilayered Perceptrons and Random Forests in classifying patients from the NKI breast cancer dataset, and slightly better than the scoring-based method originally proposed by the authors of the 70-gene signature. Furthermore, Genetic Programming is able to perform an automatic feature selection. Conclusions. Since the performance of Genetic Programming is likely to be improvable compared to the out-of-the-box approach used here, and given the biological insight potentially provided by the Genetic Programming solutions, we conclude that Genetic Programming methods are worth further investigation as a tool for cancer patient classification based on gene expression data.
2011
4:12
1
13
http://www.biodatamining.org/content/4/1/12/abstract
machine learning; breast cancer; survival prediction; genetic programming; evolutionary computation
Leonardo Vanneschi; Antonella Farinaccio; Giancarlo Mauri; Mauro Antoniotti; Paolo Provero; Mario Giacobini
File in questo prodotto:
File Dimensione Formato  
563715_oa.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 370.66 kB
Formato Adobe PDF
370.66 kB Adobe PDF Visualizza/Apri
Vanneschi2011_Article_AComparisonOfMachineLearningTe.pdf

Accesso riservato

Tipo di file: PDF EDITORIALE
Dimensione 412.84 kB
Formato Adobe PDF
412.84 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/86482
Citazioni
  • ???jsp.display-item.citation.pmc??? 20
  • Scopus 44
  • ???jsp.display-item.citation.isi??? 35
social impact