DepMiner: A method and a system for the extraction of significant dependencies

Meo, Rosa; D'Ambrosi, L.

We propose DepMiner, a method implementing a simple but effec- tive model for the evaluation of itemsets, and in general for the evaluation of the dependencies between the values assumed by a set of variables on a domain of ﬁnite values. This method is based on ∆, the departure of the probability of an observed event from a referential probability of the same event. The observed probability is the probability that the variables assume in the database given values; the referential probability, is the probability of the same event estimated in the condition of maximum entropy. DepMiner is able to distinguish between dependencies among the variables intrinsic to the itemset and dependencies “inherited” from the subsets: thus it is suitable to evaluate the utility of an itemset w.r.t. its subsets. The method is powerful: at the same time it detects signiﬁcant positive dependencies as well as negative ones suitable to identify rare itemsets. Since ∆ is anti-monotonic it can be embedded efﬁciently in algorithms. The system returns itemsets ranked by ∆ and presents the histogram of ∆ distribution. Parameters that govern the method, such as minimum support for itemsets and thresholds of ∆ are automatically determined by the system. The system uses the thresholds for ∆ to identify the statistically signiﬁcant itemsets. Thus it succeeds to reduce the volume of results more then competitive methods.