CINECA IRIS Institutional Research Information System

Clustering high-dimensional data is challenging. Classic metrics fail in identifying real similarities between objects. Moreover, the huge number of features makes the cluster interpretation hard. To tackle these problems, several co-clustering approaches have been proposed which try to compute a partition of objects and a partition of features simultaneously. Unfortunately, these approaches identify only a predefined number of flat co-clusters. Instead, it is useful if the clusters are arranged in a hierarchical fashion because the hierarchy provides insides on the clusters. In this paper we propose a novel hierarchical co-clustering, which builds two coupled hierarchies, one on the objects and one on features thus providing insights on both them. Our approach does not require a pre-specified number of clusters, and produces compact hierarchies because it makes n −ary splits, where n is automatically determined. We validate our approach on several high-dimensional datasets with state of the art competitors.

Parameter-Free Hierarchical Co-clustering by n-Ary Splits

IENCO, Dino;PENSA, Ruggero Gaetano;MEO, Rosa

2009-01-01

Abstract

Clustering high-dimensional data is challenging. Classic metrics fail in identifying real similarities between objects. Moreover, the huge number of features makes the cluster interpretation hard. To tackle these problems, several co-clustering approaches have been proposed which try to compute a partition of objects and a partition of features simultaneously. Unfortunately, these approaches identify only a predefined number of flat co-clusters. Instead, it is useful if the clusters are arranged in a hierarchical fashion because the hierarchy provides insides on the clusters. In this paper we propose a novel hierarchical co-clustering, which builds two coupled hierarchies, one on the objects and one on features thus providing insights on both them. Our approach does not require a pre-specified number of clusters, and produces compact hierarchies because it makes n −ary splits, where n is automatically determined. We validate our approach on several high-dimensional datasets with state of the art competitors.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2009
			
	Titolo dell'evento
	
				20th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECML PKDD 2009
			
	Luogo dell'evento
	
				Bled, Slovenia
			
	Data dell'evento
	
				September 7-11, 2009
			
	Titolo del volume
	
				Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I
			
	Nome editore
	
				SPRINGER-VERLAG
			
	N. Volume
	
				5781/2009
			
	Pagine (da)
	
				580
			
	Pagine (a)
	
				595
			
	Codice ISBN
	
				9783642041792
			
	DOI
	
				https://dx.doi.org/10.1007/978-3-642-04180-8_55
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				http://www.ecmlpkdd2009.org/
			
	Tutti gli autori
	
						D. Ienco; R. G. Pensa; R. Meo
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
ecml-pkdd09.pdf Accesso riservato Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE) Dimensione 273.41 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	273.41 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/66893

Citazioni

ND

17

11

social impact