The maturity of structured knowledge bases and semantic resources has contributed to the enhancement of document clustering algorithms, that may take advantage of conceptual representations as an alternative for classic bag-of-words models. However, operating in the semantic space is not always the best choice in those domain where the choice of terms also matters. Moreover, users are usually required to provide a valid number of clusters as input, but this parameter is often hard to guess, due to the exploratory nature of the clustering process. To address these limitations, we propose a multi-view co-clustering approach that processes simultaneously the classic document-term matrix and an enhanced document-concept representation of the same collection of documents. Our algorithm has multiple key-features: it finds an arbitrary number of clusters and provides clusters of terms and concepts as easy-to-interpret summaries. We show the effectiveness of our approach in an extensive experimental study involving several corpora with different levels of complexity.
Concept-Enhanced Multi-view Co-clustering of Document Data
RHO, VALENTINA
First
;PENSA, Ruggero GaetanoLast
2017-01-01
Abstract
The maturity of structured knowledge bases and semantic resources has contributed to the enhancement of document clustering algorithms, that may take advantage of conceptual representations as an alternative for classic bag-of-words models. However, operating in the semantic space is not always the best choice in those domain where the choice of terms also matters. Moreover, users are usually required to provide a valid number of clusters as input, but this parameter is often hard to guess, due to the exploratory nature of the clustering process. To address these limitations, we propose a multi-view co-clustering approach that processes simultaneously the classic document-term matrix and an enhanced document-concept representation of the same collection of documents. Our algorithm has multiple key-features: it finds an arbitrary number of clusters and provides clusters of terms and concepts as easy-to-interpret summaries. We show the effectiveness of our approach in an extensive experimental study involving several corpora with different levels of complexity.File | Dimensione | Formato | |
---|---|---|---|
mimosa_ismis17_4aperto.pdf
Accesso aperto
Descrizione: pdf open
Tipo di file:
POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione
478.28 kB
Formato
Adobe PDF
|
478.28 kB | Adobe PDF | Visualizza/Apri |
ismis2017_printed.pdf
Accesso riservato
Descrizione: pdf editoriale
Tipo di file:
PDF EDITORIALE
Dimensione
470.63 kB
Formato
Adobe PDF
|
470.63 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.