DepMiner is a software prototype implementing a simple but effective model for the evaluation of itemsets, and in general for the evaluation of the dependencies between the variables on a domain of finite values. This method is based on Delta, the departure of the observed probability of a set of valued variables in a database and a referential probability, estimated in the condition of maximum entropy. It is able to distinguish between dependencies intrinsic to the itemset and dependencies “inherited” from the subsets: thus it is suitable to directly compare the utility of an itemset with its subsets and to reduce the volume of non significant itemsets in the result of a frequent itemset mining request. This method is powerful because at the same time is able to detect significant positive dependencies as well as negative ones that occur when the association among the variables is rarer than expected. The software system returns itemsets ranked by a normalized version of Delta and the histograms of the values of Delta. We have a method for setting a threshold for Delta based on a statistical test. We show that it is anti-monotonic and can be embedded efficiently in algorithms. Finally, we employ Delta to characterize the volume of the existing dependencies in database variables and show a method to quantify it.

DepMiner 1.0

MEO, Rosa;
2009

Abstract

DepMiner is a software prototype implementing a simple but effective model for the evaluation of itemsets, and in general for the evaluation of the dependencies between the variables on a domain of finite values. This method is based on Delta, the departure of the observed probability of a set of valued variables in a database and a referential probability, estimated in the condition of maximum entropy. It is able to distinguish between dependencies intrinsic to the itemset and dependencies “inherited” from the subsets: thus it is suitable to directly compare the utility of an itemset with its subsets and to reduce the volume of non significant itemsets in the result of a frequent itemset mining request. This method is powerful because at the same time is able to detect significant positive dependencies as well as negative ones that occur when the association among the variables is rarer than expected. The software system returns itemsets ranked by a normalized version of Delta and the histograms of the values of Delta. We have a method for setting a threshold for Delta based on a statistical test. We show that it is anti-monotonic and can be embedded efficiently in algorithms. Finally, we employ Delta to characterize the volume of the existing dependencies in database variables and show a method to quantify it.
1.0
Leonardo D'ambrosi home page:http://www.leodambrosi.it/depminer/
http://www.leodambrosi.it/depminer/
dipendenze tra variabili; randomizzazione
R. Meo; L. D'Ambrosi
File in questo prodotto:
File Dimensione Formato  
ECML-PKDD-demo.pdf

Accesso aperto

Tipo di file: PREPRINT (PRIMA BOZZA)
Dimensione 109.5 kB
Formato Adobe PDF
109.5 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2318/75802
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact