Given Boolean data sets which record properties of objects, Formal Concept Analysis is a well-known approach for knowledge discovery. Recent application domains, e.g., for very large data sets, have motivated new algorithms which can perform constraint-based mining of formal concepts (i.e., closed sets on both dimensions which are associated by the Galois connection and satisfy some user-defined constraints). In this paper, we consider a major limit of these approaches when considering noisy data sets. This is indeed the case of Boolean gene expression data analysis where objects denote biological experiments and attributes denote gene expression properties. In this type of intrinsically noisy data, the Galois association is so strong that the number of extracted formal concepts explodes. We formalize the computation of the so-called δ-bi-sets as an alternative for capturing strong associations between sets of objects and sets of properties. Based on a previous work on approximate condensed representations of frequent sets by means of δ-free itemsets, we get an efficient technique which can be applied on large data sets. An experimental validation on both synthetic and real data is given. It confirms the added-value of our approach w.r.t. formal concept discovery, i.e., the extraction of smaller collections of relevant associations.

Towards fault-tolerant formal concept analysis

PENSA, Ruggero Gaetano;
2005-01-01

Abstract

Given Boolean data sets which record properties of objects, Formal Concept Analysis is a well-known approach for knowledge discovery. Recent application domains, e.g., for very large data sets, have motivated new algorithms which can perform constraint-based mining of formal concepts (i.e., closed sets on both dimensions which are associated by the Galois connection and satisfy some user-defined constraints). In this paper, we consider a major limit of these approaches when considering noisy data sets. This is indeed the case of Boolean gene expression data analysis where objects denote biological experiments and attributes denote gene expression properties. In this type of intrinsically noisy data, the Galois association is so strong that the number of extracted formal concepts explodes. We formalize the computation of the so-called δ-bi-sets as an alternative for capturing strong associations between sets of objects and sets of properties. Based on a previous work on approximate condensed representations of frequent sets by means of δ-free itemsets, we get an efficient technique which can be applied on large data sets. An experimental validation on both synthetic and real data is given. It confirms the added-value of our approach w.r.t. formal concept discovery, i.e., the extraction of smaller collections of relevant associations.
2005
9th Congress of the Italian Association for Artificial Intelligence AI*IA'05
Milano, Italy
September 21-23, 2005
AI*IA 2005: Advances in Artificial Intelligence
Springer
3673
212
223
978-3-540-29041-4
978-3-540-31733-3
https://link.springer.com/chapter/10.1007%2F11558590_22
fault-tolerant pattern mining
R. G. Pensa; J-F. Boulicaut
File in questo prodotto:
File Dimensione Formato  
aiia05.pdf

Accesso riservato

Descrizione: pdf editoriale
Tipo di file: PDF EDITORIALE
Dimensione 451.81 kB
Formato Adobe PDF
451.81 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/67664
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 29
  • ???jsp.display-item.citation.isi??? 17
social impact