CINECA IRIS Institutional Research Information System

Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance between pairs of values of the same categorical attribute, since they are not ordered. In this paper, we propose a method to learn a context-based distance for categorical attributes. The key intuition of this work is that the distance between two values of a categorical attribute Ai can be determined by the way in which the values of the other attributes Aj are distributed in the dataset objects: if they are similarly distributed in the groups of objects in correspondence of the distinct values of Ai a low value of distance is obtained. We propose also a solution to the critical point of the choice of the attributes Aj. We validate our approach on various real world and synthetic datasets, by embedding our distance learning method in both a partitional and a hierarchical clustering algorithm. Experimental results show that our method is competitive w.r.t. categorical data clustering approaches in the state of the art.

Context-Based Distance Learning for Categorical Data Clustering

IENCO, Dino;PENSA, Ruggero Gaetano;MEO, Rosa

2009-01-01

Abstract

Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance between pairs of values of the same categorical attribute, since they are not ordered. In this paper, we propose a method to learn a context-based distance for categorical attributes. The key intuition of this work is that the distance between two values of a categorical attribute Ai can be determined by the way in which the values of the other attributes Aj are distributed in the dataset objects: if they are similarly distributed in the groups of objects in correspondence of the distinct values of Ai a low value of distance is obtained. We propose also a solution to the critical point of the choice of the attributes Aj. We validate our approach on various real world and synthetic datasets, by embedding our distance learning method in both a partitional and a hierarchical clustering algorithm. Experimental results show that our method is competitive w.r.t. categorical data clustering approaches in the state of the art.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2009
			
	Lingua di pubblicazione
	
				Inglese
			
	Su invito
	
				contributo
			
	Tipo di evento
	
				1 - Conferenza
			
	Titolo dell'evento
	
				8th International Symposium on Intelligent Data Analysis, IDA 2009, Lyon
			
	Luogo dell'evento
	
				Lyon, France
			
	Data dell'evento
	
				August 31 - September 2, 2009
			
	Rilevanza dell'evento
	
				Internazionale
			
	Titolo del volume
	
				Advances in Intelligent Data Analysis VIII, 8th International Symposium on Intelligent Data Analysis, IDA 2009, Lyon, France, August 31 - September 2, 2009. Proceedings
			
	Referee
	
				Esperti anonimi
			
	Nome editore
	
				SPRINGER-VERLAG
			
	Città editore
	
				Berlin
			
	Nazione editore
	
				GERMANIA
			
	N. Volume
	
				5772/2009
			
	Pagine (da)
	
				83
			
	Pagine (a)
	
				94
			
	Numero di Pagine
	
				12
			
	Titolo della serie (se presente ISSN)
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Codice ISBN
	
				9783642039140
			
	Codice ISI WoS
	
				WOS:000272279100008
			
	Codice Scopus
	
				2-s2.0-70349858139
			
	DOI
	
				https://dx.doi.org/10.1007/978-3-642-03915-7_8
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				http://ida09.liris.cnrs.fr/
			
	Numero autori
	
				3
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Tipologia
	
				04-CONTRIBUTO IN ATTI DI CONVEGNO::04A-Conference paper in volume
			
	Tutti gli autori
	
						D. Ienco; R. G. Pensa; R. Meo
					
	Tipologia sito docente
	
				273
			
	Fulltext
	
				reserved
			
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
ida2009_dilca.pdf Accesso riservato Tipo di file: PREPRINT (PRIMA BOZZA) Dimensione 189.84 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	189.84 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/66894

Citazioni

ND

41

26

social impact