In this paper we investigate the possibility of an automatic construction of conceptual taxonomies and evaluate the achievable results. The hierarchy is performed by Ward algorithm, guided by Goodman-Kruskal τ as proximity measure. Then, we provide a concise description of each cluster by a keyword representative selected by PageRank. The obtained hierarchy has the same advantages - both descriptive and operative - of indices on keywords which partition a set of documents with respect to their content. We performed experiments in a real case - the abstracts of the papers published in ACM TODS in which the papers have been manually classified into the ACM Computing Taxonomy (CT).We evaluated objectively the generated hierarchy by two methods: Jaccard measure and entropy. We obtained good results by both the methods. Finally we evaluated the capability to classify in the categories of the two taxonomies showing that KH provides a greater facility than CT.
Towards Automatic Construction of Conceptual Taxonomies
MEO, Rosa;IENCO, Dino
2008-01-01
Abstract
In this paper we investigate the possibility of an automatic construction of conceptual taxonomies and evaluate the achievable results. The hierarchy is performed by Ward algorithm, guided by Goodman-Kruskal τ as proximity measure. Then, we provide a concise description of each cluster by a keyword representative selected by PageRank. The obtained hierarchy has the same advantages - both descriptive and operative - of indices on keywords which partition a set of documents with respect to their content. We performed experiments in a real case - the abstracts of the papers published in ACM TODS in which the papers have been manually classified into the ACM Computing Taxonomy (CT).We evaluated objectively the generated hierarchy by two methods: Jaccard measure and entropy. We obtained good results by both the methods. Finally we evaluated the capability to classify in the categories of the two taxonomies showing that KH provides a greater facility than CT.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.