CINECA IRIS Institutional Research Information System

In common binary classification scenarios, the presence of both positive and negative examples in training data is needed to build an efficient classifier. Unfortunately, in many domains, this requirement is not satisfied and only one class of examples is available. To cope with this setting, classification algorithms have been introduced that learn from Positive and Unlabeled (PU) data. Originally, these approaches were exploited in the context of document classification. Only few works address the PU problem for categorical dataset. Nevertheless, the available algorithms are mainly based on Naive Bayes classifiers. In this work we present a new distance based PU learning approach for categorical data: Pulce. Our framework takes advantage of the intrinsic relationships between attribute values and exceeds the independence assumption made by Naive Bayes. Pulce, in fact, leverages on the statistical properties of the data to learn a distance metric employed during the classification task. We extensively validate our approach over real world datasets and demonstrate that our strategy obtains statistically significant improvements w.r.t. state-of-the-art competitors.

Learning from Categorical Attribute Relationships for Positive-Unlabeled Classification

IENCO, Dino;PENSA, Ruggero Gaetano

2014-01-01

Abstract

In common binary classification scenarios, the presence of both positive and negative examples in training data is needed to build an efficient classifier. Unfortunately, in many domains, this requirement is not satisfied and only one class of examples is available. To cope with this setting, classification algorithms have been introduced that learn from Positive and Unlabeled (PU) data. Originally, these approaches were exploited in the context of document classification. Only few works address the PU problem for categorical dataset. Nevertheless, the available algorithms are mainly based on Naive Bayes classifiers. In this work we present a new distance based PU learning approach for categorical data: Pulce. Our framework takes advantage of the intrinsic relationships between attribute values and exceeds the independence assumption made by Naive Bayes. Pulce, in fact, leverages on the statistical properties of the data to learn a distance metric employed during the classification task. We extensively validate our approach over real world datasets and demonstrate that our strategy obtains statistically significant improvements w.r.t. state-of-the-art competitors.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2014
			
	Titolo dell'evento
	
				International Workshop on Representation Learning (RL 2014)
			
	Luogo dell'evento
	
				Nancy, France
			
	Data dell'evento
	
				September 15, 2014
			
	Titolo del volume
	
				Proceedings of the International Workshop on Representation Learning (RL 2014), co-located with ECML/PKDD 2014
			
	Nome editore
	
				Beijing University of Posts and Telecommunications
			
	Pagine (da)
	
				1
			
	Pagine (a)
	
				12
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				http://conference.bupt.edu.cn/rl2014/
			
	Parole Chiave
	
				metric learning; partially supervised learning; categorical data
			
	Tutti gli autori
	
						D. Ienco; R.G. Pensa
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
rl2014_4aperto_1386590.pdf Accesso aperto Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE) Dimensione 159.06 kB Formato Adobe PDF Visualizza/Apri	159.06 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/149390

Citazioni

ND

ND

ND

social impact