A new model to evaluate dependencies in data mining problems is presented and discussed. The well-known concept of the association rule is replaced by the new definition of dependence value, which is a single real number uniquely associated with a given itemset. Knowledge of dependence values is sufficient to describe all the dependencies characterizing a given data mining problem. The dependence value of an itemset is the difference between the occurrence probability of the itemset and a corresponding “maximum independence estimate.” This can be determined as a function of joint probabilities of the subsets of the itemset being considered by maximizing a suitable entropy function. So it is possible to separate in an itemset of cardinaltiy k the dependence inherited from its subsets of cardinality (k − 1) and the specific inherent dependence of that itemset. The absolute value of the difference between the probability p(i) of the event i that indicates the prescence of the itemset {a,b,... } and its maximum independence estimate is constant for any combination of values of Q &angl0; a,b,... &angr0; Q. In1p addition, the Boolean function specifying the combination of values for which the dependence is positive is a parity function. So the determination of such combinations is immediate. The model appears to be simple and powerful.
Theory of Dependence Values
MEO, Rosa
2000-01-01
Abstract
A new model to evaluate dependencies in data mining problems is presented and discussed. The well-known concept of the association rule is replaced by the new definition of dependence value, which is a single real number uniquely associated with a given itemset. Knowledge of dependence values is sufficient to describe all the dependencies characterizing a given data mining problem. The dependence value of an itemset is the difference between the occurrence probability of the itemset and a corresponding “maximum independence estimate.” This can be determined as a function of joint probabilities of the subsets of the itemset being considered by maximizing a suitable entropy function. So it is possible to separate in an itemset of cardinaltiy k the dependence inherited from its subsets of cardinality (k − 1) and the specific inherent dependence of that itemset. The absolute value of the difference between the probability p(i) of the event i that indicates the prescence of the itemset {a,b,... } and its maximum independence estimate is constant for any combination of values of Q &angl0; a,b,... &angr0; Q. In1p addition, the Boolean function specifying the combination of values for which the dependence is positive is a parity function. So the determination of such combinations is immediate. The model appears to be simple and powerful.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.