In this paper we propose and test the use of hierarchical clustering for feature selection. The clustering method is Ward's with a distance measure based on Goodman-Kruskal tau.We motivate the choice of this measure and compare it with other ones.Our hierarchical clustering is applied to over 40 data-sets from UCI archive. The proposed approach is interesting from many viewpoints.First, it produces the feature subsets dendrogram which serves as a valuable tool to study relevance relationships among features.Secondarily, the dendrogram is used in a feature selection algorithm to select the best features by a wrapper method.Experiments were run with three different families of classifiers: Naive Bayes, decision trees and k nearest neighbours.Our method allows all the three classifiers to generally outperform their corresponding ones without feature selection.We compare our feature selection with other state-of-the-art methods, obtaining on average a better classification accuracy, though obtaining a lower reduction in the number of features. Moreover, differently from other approaches for feature selection, our method does not require any parameter tuning.

Exploration and Reduction of the Feature Space by Hierarchical Clustering

IENCO, Dino;MEO, Rosa
2008-01-01

Abstract

In this paper we propose and test the use of hierarchical clustering for feature selection. The clustering method is Ward's with a distance measure based on Goodman-Kruskal tau.We motivate the choice of this measure and compare it with other ones.Our hierarchical clustering is applied to over 40 data-sets from UCI archive. The proposed approach is interesting from many viewpoints.First, it produces the feature subsets dendrogram which serves as a valuable tool to study relevance relationships among features.Secondarily, the dendrogram is used in a feature selection algorithm to select the best features by a wrapper method.Experiments were run with three different families of classifiers: Naive Bayes, decision trees and k nearest neighbours.Our method allows all the three classifiers to generally outperform their corresponding ones without feature selection.We compare our feature selection with other state-of-the-art methods, obtaining on average a better classification accuracy, though obtaining a lower reduction in the number of features. Moreover, differently from other approaches for feature selection, our method does not require any parameter tuning.
2008
SIAM Conference on Data Mining
Atlanta, Georgia, USA
April, 24-26, 2008
Proceedings of the 2008 SIAM Conference on Data Mining
Society for Industrial and Applied Mathematics
-
577
587
9780898716542
http://www.siam.org/meetings/sdm08/
GoodMan-Kruskal Tau; feature selection; hierarchical clustering; Ward
Ienco, Dino; Meo, Rosa
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/35174
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 21
  • ???jsp.display-item.citation.isi??? ND
social impact