Co-clustering is a powerful data mining tool that extracts summary information from a data matrix, by simultaneously computing row and column clusters that provide a compact representation of the data. However, if the matrix contains data about individuals, the co-clustering results may be influenced by the societal biases that are reproduced in the data. Consequently, subsequent tasks such as recommendation systems may also be influenced by these biases, thereby compromising the fairness and integrity of the overall knowledge discovery or machine learning process. Despite the extensive research on fairness considerations in clustering, this issue has not been addressed in the context of co-clustering algorithms. In addressing this critical gap in the literature, this paper proposes a novel fair co-clustering algorithm. The proposed algorithm is based on an associative measure derived from the Goodman-Kruskal’s tau, which has demonstrated good convergence properties. This ensures optimal clustering and fairness performance by implementing an in-process rebalancing mechanism inspired by the fair assignment problem. An extensive experimental validation is provided to demonstrate the efficacy of our approach, also in comparison to a state-of-the-art method that uses co-clustering for fair recommendation.

Fair Associative Co-clustering

Peiretti, Federico
First
Membro del Collaboration Group
;
Pensa, Ruggero G.
Last
Membro del Collaboration Group
2025-01-01

Abstract

Co-clustering is a powerful data mining tool that extracts summary information from a data matrix, by simultaneously computing row and column clusters that provide a compact representation of the data. However, if the matrix contains data about individuals, the co-clustering results may be influenced by the societal biases that are reproduced in the data. Consequently, subsequent tasks such as recommendation systems may also be influenced by these biases, thereby compromising the fairness and integrity of the overall knowledge discovery or machine learning process. Despite the extensive research on fairness considerations in clustering, this issue has not been addressed in the context of co-clustering algorithms. In addressing this critical gap in the literature, this paper proposes a novel fair co-clustering algorithm. The proposed algorithm is based on an associative measure derived from the Goodman-Kruskal’s tau, which has demonstrated good convergence properties. This ensures optimal clustering and fairness performance by implementing an in-process rebalancing mechanism inspired by the fair assignment problem. An extensive experimental validation is provided to demonstrate the efficacy of our approach, also in comparison to a state-of-the-art method that uses co-clustering for fair recommendation.
2025
The 2025 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2025)
Porto, Portugal
September 15-19, 2025
Machine Learning and Knowledge Discovery in Databases. Research Track.
Springer
16013
282
300
9783032059611
9783032059628
https://link.springer.com/chapter/10.1007/978-3-032-05962-8_17
Clustering, Fairness, High-dimensional data
Peiretti, Federico; Pensa, Ruggero G.
File in questo prodotto:
File Dimensione Formato  
ecmlpkdd2025_preprint.pdf

Accesso aperto

Descrizione: Preprint
Tipo di file: PREPRINT (PRIMA BOZZA)
Dimensione 1.41 MB
Formato Adobe PDF
1.41 MB Adobe PDF Visualizza/Apri
ecmlpkdd2025_printed.pdf

Accesso riservato

Descrizione: PDF editore
Tipo di file: PDF EDITORIALE
Dimensione 1.87 MB
Formato Adobe PDF
1.87 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2096930
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact