Association rules are an intuitive descriptive paradigm that has been used extensively in later years and in different application domains with the purpose to identify the regularities and correlation in a set of observed objects. However, recently, association rules’ statistical measures (support and confidence) have been criticized because in some cases have shown to fail their primary goal that is to select the most relevant and significant association rules. In this paper we propose a new model that replaces the support measure. The new model, like support, is a tool for the identification of the reliable rules and is used also to reduce the traversal of the itemsets search space. The proposed model adopts new criteria in order to establish the reliability of the information extracted from the database. These criteria are based on Bayes’ Theorem and on an estimate of the probability density function of each itemset. According to our criteria, the information that we have obtained from the database on an itemset is reliable if and only if the confidence interval of the estimated probability is low compared with the most likely value of it. We will see how this method can be computed in an approximated way, but satisfactory, with computational time comparable to the test on support threshold.
Replacing Support in Association Rule Mining
MEO, Rosa;IENCO, Dino
2009-01-01
Abstract
Association rules are an intuitive descriptive paradigm that has been used extensively in later years and in different application domains with the purpose to identify the regularities and correlation in a set of observed objects. However, recently, association rules’ statistical measures (support and confidence) have been criticized because in some cases have shown to fail their primary goal that is to select the most relevant and significant association rules. In this paper we propose a new model that replaces the support measure. The new model, like support, is a tool for the identification of the reliable rules and is used also to reduce the traversal of the itemsets search space. The proposed model adopts new criteria in order to establish the reliability of the information extracted from the database. These criteria are based on Bayes’ Theorem and on an estimate of the probability density function of each itemset. According to our criteria, the information that we have obtained from the database on an itemset is reliable if and only if the confidence interval of the estimated probability is low compared with the most likely value of it. We will see how this method can be computed in an approximated way, but satisfactory, with computational time comparable to the test on support threshold.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.