Since its early formulations, co-clustering has gained popularity and interest both within and outside the machine learning community as a powerful learning paradigm for clustering high-dimensional data with good explainability properties. The simultaneous partitioning of all the modes of the input data tensors (rows and columns in a data matrix) is both a method for improving clustering on one mode while performing dimensionality reduction on the other mode(s), and a tool for providing an actionable interpretation of the clusters in the main mode as summaries of the features in each other mode(s). Hence, it is useful in many complex decision systems and data science applications. In this paper, we survey the the co-clustering literature by reviewing the main co-clustering methods, with a special focus on the work done in the last twenty-five years. We identify, describe and compare the main algorithmic categories, and provide a practical characterization with respect to similar unsupervised techniques. Additionally, we also try to explain why it is still a powerful tool despite the apparent recent decreasing interest shown by the machine learning community. To this purpose, we review the most recent trends in co-clustering research and outline the open problems and promising future research perspectives.
Co-clustering: a Survey of the Main Methods, Recent Trends and Open Problems
Elena BattagliaCo-first
Membro del Collaboration Group
;Federico PeirettiCo-first
Membro del Collaboration Group
;Ruggero Gaetano Pensa
Last
Membro del Collaboration Group
2024-01-01
Abstract
Since its early formulations, co-clustering has gained popularity and interest both within and outside the machine learning community as a powerful learning paradigm for clustering high-dimensional data with good explainability properties. The simultaneous partitioning of all the modes of the input data tensors (rows and columns in a data matrix) is both a method for improving clustering on one mode while performing dimensionality reduction on the other mode(s), and a tool for providing an actionable interpretation of the clusters in the main mode as summaries of the features in each other mode(s). Hence, it is useful in many complex decision systems and data science applications. In this paper, we survey the the co-clustering literature by reviewing the main co-clustering methods, with a special focus on the work done in the last twenty-five years. We identify, describe and compare the main algorithmic categories, and provide a practical characterization with respect to similar unsupervised techniques. Additionally, we also try to explain why it is still a powerful tool despite the apparent recent decreasing interest shown by the machine learning community. To this purpose, we review the most recent trends in co-clustering research and outline the open problems and promising future research perspectives.File | Dimensione | Formato | |
---|---|---|---|
csur2024_printed.pdf
Accesso aperto
Descrizione: PDF open access
Tipo di file:
PDF EDITORIALE
Dimensione
869.7 kB
Formato
Adobe PDF
|
869.7 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.