Instance-based classifiers that compute similarity between instances suffer from the presence of noise in the training set and from overfitting. In this paper we propose a new type of distancebased classifier that instead of computing distances between instances computes the distance between each test instance and the classes. Both the test instance and the classes are represented by patterns in the space of the frequent itemsets. We ranked the itemsets by metrics of itemset significance. Then we considered only the top portion of the ranking that leads the classifier to reach the maximum accuracy. We have experimented on a large collection of datasets from UCI archive with different proximity measures and different metrics of itemsets ranking. We show that our method has many benefits: it reduces the number of distance computations, improves the classification accuracy of state-of-the art classifiers, like decision trees, SVM, knn, Naive Bayes, rule-based classifiers and association rule-based ones and outperforms the competitors especially on noise data.
File in questo prodotto:
Non ci sono file associati a questo prodotto.