Clustering high-dimensional data is challenging. Classic metrics fail in identifying real similarities between objects. Moreover, the huge number of features makes the cluster interpretation hard. To tackle these problems, several co-clustering approaches have been proposed which try to compute a partition of objects and a partition of features simultaneously. Unfortunately, these approaches identify only a predefined number of flat co-clusters. Instead, it is useful if the clusters are arranged in a hierarchical fashion because the hierarchy provides insides on the clusters. In this paper we propose a novel hierarchical co-clustering, which builds two coupled hierarchies, one on the objects and one on features thus providing insights on both them. Our approach does not require a pre-specified number of clusters, and produces compact hierarchies because it makes n −ary splits, where n is automatically determined. We validate our approach on several high-dimensional datasets with state of the art competitors.

Parameter-Free Hierarchical Co-clustering by n-Ary Splits

IENCO, Dino;PENSA, Ruggero Gaetano;MEO, Rosa
2009-01-01

Abstract

Clustering high-dimensional data is challenging. Classic metrics fail in identifying real similarities between objects. Moreover, the huge number of features makes the cluster interpretation hard. To tackle these problems, several co-clustering approaches have been proposed which try to compute a partition of objects and a partition of features simultaneously. Unfortunately, these approaches identify only a predefined number of flat co-clusters. Instead, it is useful if the clusters are arranged in a hierarchical fashion because the hierarchy provides insides on the clusters. In this paper we propose a novel hierarchical co-clustering, which builds two coupled hierarchies, one on the objects and one on features thus providing insights on both them. Our approach does not require a pre-specified number of clusters, and produces compact hierarchies because it makes n −ary splits, where n is automatically determined. We validate our approach on several high-dimensional datasets with state of the art competitors.
2009
20th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECML PKDD 2009
Bled, Slovenia
September 7-11, 2009
Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I
SPRINGER-VERLAG
5781/2009
580
595
9783642041792
http://www.ecmlpkdd2009.org/
D. Ienco; R. G. Pensa; R. Meo
File in questo prodotto:
File Dimensione Formato  
ecml-pkdd09.pdf

Accesso riservato

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 273.41 kB
Formato Adobe PDF
273.41 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/66893
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 11
social impact