CINECA IRIS Institutional Research Information System

In the generic setting of objects x attributes matrix data analysis, co-clustering appears as an interesting unsupervised data mining method. A co-clustering task provides a bi-partition made of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support expert interpretations. Many constrained clustering algorithms have been proposed to exploit the domain knowledge and to improve partition relevancy in the mono-dimensional clustering case (e.g. using the must-link and cannot-link constraints on one of the two dimensions). Here, we consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e. both objects and attributes can be involved), but also for interval constraints that enforce properties of co-clusters when considering ordered domains. We describe an iterative co-clustering algorithm which exploits user-defined constraints while minimizing a given objective function. Thanks to a generic setting, we emphasize that different objective functions can be used. The added value of our approach is demonstrated on both synthetic and real data. Among others, several experiments illustrate the practical impact of this original co-clustering setting in the context of gene expression data analysis, and in an original application to a protein motif discovery problem.

Co-clustering Numerical Data under User-defined Constraints

PENSA, Ruggero Gaetano;J. F. Boulicaut;CORDERO, Francesca;M. Atzori

2010-01-01

Abstract

In the generic setting of objects x attributes matrix data analysis, co-clustering appears as an interesting unsupervised data mining method. A co-clustering task provides a bi-partition made of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support expert interpretations. Many constrained clustering algorithms have been proposed to exploit the domain knowledge and to improve partition relevancy in the mono-dimensional clustering case (e.g. using the must-link and cannot-link constraints on one of the two dimensions). Here, we consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e. both objects and attributes can be involved), but also for interval constraints that enforce properties of co-clusters when considering ordered domains. We describe an iterative co-clustering algorithm which exploits user-defined constraints while minimizing a given objective function. Thanks to a generic setting, we emphasize that different objective functions can be used. The added value of our approach is demonstrated on both synthetic and real data. Among others, several experiments illustrate the practical impact of this original co-clustering setting in the context of gene expression data analysis, and in an original application to a protein motif discovery problem.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2010
			
	Titolo rivista
	
				STATISTICAL ANALYSIS AND DATA MINING
			
	N. Volume
	
				3
			
	Fascicolo
	
				1
			
	Pagine (da)
	
				38
			
	Pagine (a)
	
				55
			
	DOI
	
				https://dx.doi.org/10.1002/sam.10064
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				http://www3.interscience.wiley.com/journal/112701062/home
			
	Parole Chiave
	
				semi-supervised clustering; co-clustering; microarray analysis
			
	Tutti gli autori
	
						R. G. Pensa; J-F. Boulicaut; F. Cordero; M. Atzori
					
	Appare nelle tipologie:
	
				03A-Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
sam_special_issue_selected_sdm08_R2.pdf Accesso riservato Tipo di file: PREPRINT (PRIMA BOZZA) Dimensione 899.27 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	899.27 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
sam_special_issue_selected_sdm08_R2_4aperto.pdf Accesso aperto Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE) Dimensione 919.63 kB Formato Adobe PDF Visualizza/Apri	919.63 kB	Adobe PDF	Visualizza/Apri
sam2010_printed.pdf Accesso riservato Descrizione: PDF versione a stampa Tipo di file: PDF EDITORIALE Dimensione 782.17 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	782.17 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/72757

Citazioni

ND

15

ND

social impact