Since its early formulations, co-clustering has gained popularity and interest both within and outside the machine learning community as a powerful learning paradigm for clustering high-dimensional data with good explainability properties. The simultaneous partitioning of all the modes of the input data tensors (rows and columns in a data matrix) is both a method for improving clustering on one mode while performing dimensionality reduction on the other mode(s), and a tool for providing an actionable interpretation of the clusters in the main mode as summaries of the features in each other mode(s). Hence, it is useful in many complex decision systems and data science applications. In this paper, we survey the the co-clustering literature by reviewing the main co-clustering methods, with a special focus on the work done in the last twenty-five years. We identify, describe and compare the main algorithmic categories, and provide a practical characterization with respect to similar unsupervised techniques. Additionally, we also try to explain why it is still a powerful tool despite the apparent recent decreasing interest shown by the machine learning community. To this purpose, we review the most recent trends in co-clustering research and outline the open problems and promising future research perspectives.

Co-clustering: a Survey of the Main Methods, Recent Trends and Open Problems

Elena Battaglia
Co-first
Membro del Collaboration Group
;
Federico Peiretti
Co-first
Membro del Collaboration Group
;
Ruggero Gaetano Pensa
Last
Membro del Collaboration Group
2024-01-01

Abstract

Since its early formulations, co-clustering has gained popularity and interest both within and outside the machine learning community as a powerful learning paradigm for clustering high-dimensional data with good explainability properties. The simultaneous partitioning of all the modes of the input data tensors (rows and columns in a data matrix) is both a method for improving clustering on one mode while performing dimensionality reduction on the other mode(s), and a tool for providing an actionable interpretation of the clusters in the main mode as summaries of the features in each other mode(s). Hence, it is useful in many complex decision systems and data science applications. In this paper, we survey the the co-clustering literature by reviewing the main co-clustering methods, with a special focus on the work done in the last twenty-five years. We identify, describe and compare the main algorithmic categories, and provide a practical characterization with respect to similar unsupervised techniques. Additionally, we also try to explain why it is still a powerful tool despite the apparent recent decreasing interest shown by the machine learning community. To this purpose, we review the most recent trends in co-clustering research and outline the open problems and promising future research perspectives.
2024
57
2
1
33
https://dl.acm.org/doi/10.1145/3698875
cluster analysis, surveys and overviews, clustering
Elena Battaglia; Federico Peiretti; Ruggero Gaetano Pensa
File in questo prodotto:
File Dimensione Formato  
csur2024_printed.pdf

Accesso aperto

Descrizione: PDF open access
Tipo di file: PDF EDITORIALE
Dimensione 869.7 kB
Formato Adobe PDF
869.7 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2019731
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact