CINECA IRIS Institutional Research Information System

In common binary classification scenarios, the presence of both positive and negative examples in training data is needed to build an efficient classifier. Unfortunately, in many domains, this requirement is not satisfied and only one class of examples is available. To cope with this setting, classification algorithms have been introduced that learn from Positive and Unlabeled (PU) data. Originally, these approaches were exploited in the context of document classification. Only few works address the PU problem for categorical datasets. Nevertheless, the available algorithms are mainly based on Naive Bayes classifiers. In this work we present a new distance based PU learning approach for categorical data: Pulce. Our framework takes advantage of the intrinsic relationships between attribute values and exceeds the independence assumption made by Naive Bayes. Pulce, in fact, leverages on the statistical properties of the data to learn a distance metric employed during the classification task. We extensively validate our approach over real world datasets and demonstrate that our strategy obtains statistically significant improvements w.r.t. state-of-the-art competitors.

Positive and unlabeled learning in categorical data

IENCO, Dino;PENSA, Ruggero Gaetano

2016-01-01

Abstract

In common binary classification scenarios, the presence of both positive and negative examples in training data is needed to build an efficient classifier. Unfortunately, in many domains, this requirement is not satisfied and only one class of examples is available. To cope with this setting, classification algorithms have been introduced that learn from Positive and Unlabeled (PU) data. Originally, these approaches were exploited in the context of document classification. Only few works address the PU problem for categorical datasets. Nevertheless, the available algorithms are mainly based on Naive Bayes classifiers. In this work we present a new distance based PU learning approach for categorical data: Pulce. Our framework takes advantage of the intrinsic relationships between attribute values and exceeds the independence assumption made by Naive Bayes. Pulce, in fact, leverages on the statistical properties of the data to learn a distance metric employed during the classification task. We extensively validate our approach over real world datasets and demonstrate that our strategy obtains statistically significant improvements w.r.t. state-of-the-art competitors.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
			2016
		
	Titolo rivista
	
			NEUROCOMPUTING
		
	N. Volume
	
			196
		
	Pagine (da)
	
			113
		
	Pagine (a)
	
			124
		
	DOI
	
			https://dx.doi.org/10.1016/j.neucom.2016.01.089
		
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
			http://www.sciencedirect.com/science/article/pii/S0925231216003118
		
	Parole Chiave
	
			Positive unlabeled learning, Partially supervised learning, Distance learning, Categorical data
		
	Tutti gli autori
	
			Ienco, Dino; Pensa, Ruggero G.
		
	Appare nelle tipologie:
	
			03A-Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
neurocom2016_online.pdf Accesso riservato Descrizione: Versione online Tipo di file: PDF EDITORIALE Dimensione 1.24 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.24 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
neurocom2016_printed.pdf Accesso riservato Descrizione: Versione printed Tipo di file: PDF EDITORIALE Dimensione 1.13 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.13 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
neurocom2016_draft_4aperto.pdf Accesso aperto Tipo di file: PREPRINT (PRIMA BOZZA) Dimensione 916.3 kB Formato Adobe PDF Visualizza/Apri	916.3 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1558958

Citazioni

ND

30

23

social impact