Clustering or bi-clustering techniques have been proved quite useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. We consider eventually large Boolean data sets which record properties of objects and we assume that a bi-partition is available. We introduce a generic cluster characterization technique which is based on collections of bi-sets (i.e., sets of objects associated to sets of properties) which satisfy some user-defined constraints, and a measure of the accuracy of a given bi-set as a bi-cluster characterization pattern. The method is illustrated on both formal concepts (i.e., “maximal rectangles of true values”) and the new type of δ-bi-sets (i.e., “rectangles of true values with a bounded number of exceptions per column”). The added-value is illustrated on benchmark data and two real data sets which are intrinsically noisy: a medical data about meningitis and Plasmodium falciparum gene expression data.

From local pattern mining to relevant bi-cluster characterization

PENSA, Ruggero Gaetano;
2005-01-01

Abstract

Clustering or bi-clustering techniques have been proved quite useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. We consider eventually large Boolean data sets which record properties of objects and we assume that a bi-partition is available. We introduce a generic cluster characterization technique which is based on collections of bi-sets (i.e., sets of objects associated to sets of properties) which satisfy some user-defined constraints, and a measure of the accuracy of a given bi-set as a bi-cluster characterization pattern. The method is illustrated on both formal concepts (i.e., “maximal rectangles of true values”) and the new type of δ-bi-sets (i.e., “rectangles of true values with a bounded number of exceptions per column”). The added-value is illustrated on benchmark data and two real data sets which are intrinsically noisy: a medical data about meningitis and Plasmodium falciparum gene expression data.
2005
Inglese
contributo
1 - Conferenza
6th International Symposium on Intelligent Data Analysis IDA 2005
Madrid, Spain
September 8-10, 2005
Internazionale
Advances in Intelligent Data Analysis VI. IDA 2005
Esperti anonimi
Springer
Berlin, Heidelberg
GERMANIA
3646
293
304
12
978-3-540-28795-7
978-3-540-31926-9
https://link.springer.com/chapter/10.1007%2F11552253_27
co-clustering, pattern mining
FRANCIA
1 – prodotto con file in versione Open Access (allegherò il file al passo 6 - Carica)
2
info:eu-repo/semantics/conferenceObject
04-CONTRIBUTO IN ATTI DI CONVEGNO::04A-Conference paper in volume
R. G. Pensa; J-F. Boulicaut
273
reserved
File in questo prodotto:
File Dimensione Formato  
ida05.pdf

Accesso riservato

Descrizione: pdf editoriale
Tipo di file: PDF EDITORIALE
Dimensione 227.2 kB
Formato Adobe PDF
227.2 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/68055
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 4
social impact