CINECA IRIS Institutional Research Information System

High-Throughput technologies provide genomic and trascriptomic data that are suitable for biomarker detection for classification purposes. However, the high dimension of the output of such technologies and the characteristics of the data sets analysed represent an issue for the classification task. Here we present a new feature selection method based on three steps to detect class-specific biomarkers in case of high-dimensional data sets. The first step detects the differentially expressed genes according to the experimental conditions tested in the experimental design, the second step filters out the features with low discriminative power and the third step detects the class-specific features and defines the final biomarker as the union of the class-specific features. The proposed procedure is tested on two microarray datasets, one characterized by a strong imbalance between the size of classes and the other one where the size of classes is perfectly balanced. We show that, using the proposed feature selection procedure, the classification performances of a Support Vector Machine on the imbalanced data set reach a 82% whereas other methods do not exceed 73%. Furthermore, in case of perfectly balanced dataset, the classification performances are comparable with other methods. Finally, the Gene Ontology enrichments performed on the signatures selected with the proposed pipeline, confirm the biological relevance of our methodology. The download of the package with the implementation of Peculiar Genes Selection, ‘PGS’, is available for R users at: http://github.com/mbeccuti/PGS.

Peculiar genes selection: A new features selection method to improve classification performances in imbalanced data sets

MARTINA, FEDERICA;Beccuti, Marco;Balbo, Gianfranco;Cordero, Francesca

2017-01-01

Abstract

High-Throughput technologies provide genomic and trascriptomic data that are suitable for biomarker detection for classification purposes. However, the high dimension of the output of such technologies and the characteristics of the data sets analysed represent an issue for the classification task. Here we present a new feature selection method based on three steps to detect class-specific biomarkers in case of high-dimensional data sets. The first step detects the differentially expressed genes according to the experimental conditions tested in the experimental design, the second step filters out the features with low discriminative power and the third step detects the class-specific features and defines the final biomarker as the union of the class-specific features. The proposed procedure is tested on two microarray datasets, one characterized by a strong imbalance between the size of classes and the other one where the size of classes is perfectly balanced. We show that, using the proposed feature selection procedure, the classification performances of a Support Vector Machine on the imbalanced data set reach a 82% whereas other methods do not exceed 73%. Furthermore, in case of perfectly balanced dataset, the classification performances are comparable with other methods. Finally, the Gene Ontology enrichments performed on the signatures selected with the proposed pipeline, confirm the biological relevance of our methodology. The download of the package with the implementation of Peculiar Genes Selection, ‘PGS’, is available for R users at: http://github.com/mbeccuti/PGS.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	Lingua di pubblicazione
	
				Inglese
			
	Codice ISI WoS
	
				WOS:000407548800001
			
	Codice PubMed
	
				28806759
			
	Codice Scopus
	
				2-s2.0-85027313670
			
	Referee
	
				Esperti anonimi
			
	Titolo rivista
	
				PLOS ONE
			
	N. Volume
	
				12
			
	Fascicolo
	
				8
			
	Pagine (da)
	
				e0177475
			
	Pagine (a)
	
				e0177475
			
	Numero di pagine totale
	
				18
			
	DOI
	
				https://dx.doi.org/10.1371/journal.pone.0177475
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0177475&type=printable
			
	Parole Chiave
	
				Computational Biology; Gene Expression Profiling; Vaccination; Algorithms; Genetics and Molecular Biology (all); Agricultural and Biological Sciences (all)
			
	Coautori affiliati a enti stranieri
	
				no
			
	Prodotto conforme al Regolamento di Ateneo sull'accesso aperto?
	
				1 – prodotto con  file in versione Open Access (allegherò il file al passo 6 - Carica)
			
	Tipologia sito docente
	
				262
			
	Numero autori
	
				4
			
	Tutti gli autori
	
						Martina, Federica; Beccuti, Marco; Balbo, Gianfranco; Cordero, Francesca
					
	Tipologia
	
				info:eu-repo/semantics/article
			
	Fulltext
	
				open
			
	Tipologia
	
				03-CONTRIBUTO IN RIVISTA::03A-Articolo su Rivista
			
	Appare nelle tipologie:
	
				03A-Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
Peculiar Genes Selection- A new features selection method to improve classification performances in imbalanced data sets.pdf Accesso aperto Tipo di file: PDF EDITORIALE Dimensione 2.12 MB Formato Adobe PDF Visualizza/Apri	2.12 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1652379

Citazioni

3

7

6

social impact