We propose DepMiner, a method implementing a simple but effec- tive model for the evaluation of itemsets, and in general for the evaluation of the dependencies between the values assumed by a set of variables on a domain of finite values. This method is based on ∆, the departure of the probability of an observed event from a referential probability of the same event. The observed probability is the probability that the variables assume in the database given values; the referential probability, is the probability of the same event estimated in the condition of maximum entropy. DepMiner is able to distinguish between dependencies among the variables intrinsic to the itemset and dependencies “inherited” from the subsets: thus it is suitable to evaluate the utility of an itemset w.r.t. its subsets. The method is powerful: at the same time it detects significant positive dependencies as well as negative ones suitable to identify rare itemsets. Since ∆ is anti-monotonic it can be embedded efficiently in algorithms. The system returns itemsets ranked by ∆ and presents the histogram of ∆ distribution. Parameters that govern the method, such as minimum support for itemsets and thresholds of ∆ are automatically determined by the system. The system uses the thresholds for ∆ to identify the statistically significant itemsets. Thus it succeeds to reduce the volume of results more then competitive methods.

DepMiner: A method and a system for the extraction of significant dependencies

MEO, Rosa;
2012-01-01

Abstract

We propose DepMiner, a method implementing a simple but effec- tive model for the evaluation of itemsets, and in general for the evaluation of the dependencies between the values assumed by a set of variables on a domain of finite values. This method is based on ∆, the departure of the probability of an observed event from a referential probability of the same event. The observed probability is the probability that the variables assume in the database given values; the referential probability, is the probability of the same event estimated in the condition of maximum entropy. DepMiner is able to distinguish between dependencies among the variables intrinsic to the itemset and dependencies “inherited” from the subsets: thus it is suitable to evaluate the utility of an itemset w.r.t. its subsets. The method is powerful: at the same time it detects significant positive dependencies as well as negative ones suitable to identify rare itemsets. Since ∆ is anti-monotonic it can be embedded efficiently in algorithms. The system returns itemsets ranked by ∆ and presents the histogram of ∆ distribution. Parameters that govern the method, such as minimum support for itemsets and thresholds of ∆ are automatically determined by the system. The system uses the thresholds for ∆ to identify the statistically significant itemsets. Thus it succeeds to reduce the volume of results more then competitive methods.
2012
Data Mining: Foundations and Intelligent Paradigms
Springer Verlag
Intelligent Systems Reference Library
24
209
222
9783642232404
http://www.springer.com/engineering/computational+intelligence+and+complexity/book/978-3-642-23240-4
dependencies; itemsets; delta value; closed itemsets; positive and negative dependencies; rare events; ranking; DepMiner
Meo, Rosa; D'Ambrosi, L.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/88840
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact