In the generic setting of objects x attributes matrix data analysis, co-clustering appears as an interesting unsupervised data mining method. A co-clustering task provides a bi-partition made of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support expert interpretations. Many constrained clustering algorithms have been proposed to exploit the domain knowledge and to improve partition relevancy in the mono-dimensional clustering case (e.g. using the must-link and cannot-link constraints on one of the two dimensions). Here, we consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e. both objects and attributes can be involved), but also for interval constraints that enforce properties of co-clusters when considering ordered domains. We describe an iterative co-clustering algorithm which exploits user-defined constraints while minimizing a given objective function. Thanks to a generic setting, we emphasize that different objective functions can be used. The added value of our approach is demonstrated on both synthetic and real data. Among others, several experiments illustrate the practical impact of this original co-clustering setting in the context of gene expression data analysis, and in an original application to a protein motif discovery problem.

Co-clustering Numerical Data under User-defined Constraints

PENSA, Ruggero Gaetano;CORDERO, Francesca;
2010-01-01

Abstract

In the generic setting of objects x attributes matrix data analysis, co-clustering appears as an interesting unsupervised data mining method. A co-clustering task provides a bi-partition made of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support expert interpretations. Many constrained clustering algorithms have been proposed to exploit the domain knowledge and to improve partition relevancy in the mono-dimensional clustering case (e.g. using the must-link and cannot-link constraints on one of the two dimensions). Here, we consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e. both objects and attributes can be involved), but also for interval constraints that enforce properties of co-clusters when considering ordered domains. We describe an iterative co-clustering algorithm which exploits user-defined constraints while minimizing a given objective function. Thanks to a generic setting, we emphasize that different objective functions can be used. The added value of our approach is demonstrated on both synthetic and real data. Among others, several experiments illustrate the practical impact of this original co-clustering setting in the context of gene expression data analysis, and in an original application to a protein motif discovery problem.
2010
3
1
38
55
http://www3.interscience.wiley.com/journal/112701062/home
semi-supervised clustering; co-clustering; microarray analysis
R. G. Pensa; J-F. Boulicaut; F. Cordero; M. Atzori
File in questo prodotto:
File Dimensione Formato  
sam_special_issue_selected_sdm08_R2.pdf

Accesso riservato

Tipo di file: PREPRINT (PRIMA BOZZA)
Dimensione 899.27 kB
Formato Adobe PDF
899.27 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
sam_special_issue_selected_sdm08_R2_4aperto.pdf

Accesso aperto

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 919.63 kB
Formato Adobe PDF
919.63 kB Adobe PDF Visualizza/Apri
sam2010_printed.pdf

Accesso riservato

Descrizione: PDF versione a stampa
Tipo di file: PDF EDITORIALE
Dimensione 782.17 kB
Formato Adobe PDF
782.17 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/72757
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? ND
social impact