The maturity of structured knowledge bases and semantic resources has contributed to the enhancement of document clustering algorithms, that may take advantage of conceptual representations as an alternative for classic bag-of-words models. However, operating in the semantic space is not always the best choice in those domain where the choice of terms also matters. Moreover, users are usually required to provide a valid number of clusters as input, but this parameter is often hard to guess, due to the exploratory nature of the clustering process. To address these limitations, we propose a multi-view co-clustering approach that processes simultaneously the classic document-term matrix and an enhanced document-concept representation of the same collection of documents. Our algorithm has multiple key-features: it finds an arbitrary number of clusters and provides clusters of terms and concepts as easy-to-interpret summaries. We show the effectiveness of our approach in an extensive experimental study involving several corpora with different levels of complexity.

Concept-Enhanced Multi-view Co-clustering of Document Data

RHO, VALENTINA
First
;
PENSA, Ruggero Gaetano
Last
2017-01-01

Abstract

The maturity of structured knowledge bases and semantic resources has contributed to the enhancement of document clustering algorithms, that may take advantage of conceptual representations as an alternative for classic bag-of-words models. However, operating in the semantic space is not always the best choice in those domain where the choice of terms also matters. Moreover, users are usually required to provide a valid number of clusters as input, but this parameter is often hard to guess, due to the exploratory nature of the clustering process. To address these limitations, we propose a multi-view co-clustering approach that processes simultaneously the classic document-term matrix and an enhanced document-concept representation of the same collection of documents. Our algorithm has multiple key-features: it finds an arbitrary number of clusters and provides clusters of terms and concepts as easy-to-interpret summaries. We show the effectiveness of our approach in an extensive experimental study involving several corpora with different levels of complexity.
2017
23rd International Symposium on Methodologies for Intelligent Systems - ISMIS 2017
Warsaw, Poland
26-29 June 2017
Foundations of Intelligent Systems. ISMIS 2017.
Springer International Publishing
10352
457
467
978-3-319-60437-4
978-3-319-60438-1
https://link.springer.com/chapter/10.1007/978-3-319-60438-1_45
co-clustering, semantic enrichment, multi-view clustering
Rho, Valentina; Pensa, Ruggero G.
File in questo prodotto:
File Dimensione Formato  
mimosa_ismis17_4aperto.pdf

Accesso aperto

Descrizione: pdf open
Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 478.28 kB
Formato Adobe PDF
478.28 kB Adobe PDF Visualizza/Apri
ismis2017_printed.pdf

Accesso riservato

Descrizione: pdf editoriale
Tipo di file: PDF EDITORIALE
Dimensione 470.63 kB
Formato Adobe PDF
470.63 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1641888
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact