Co-clustering is a useful tool that extracts summary information from a data matrix in terms of row and column clusters, and gives a succinct representation of the data. However, if the matrix contains data about individuals, such representations could leak their privacy-sensitive information. In terms of privacy disclosure, co-clustering is even more harmful than clustering, because of the additional information carried by the column partition. However, to the best of our knowledge, the problem of privacy-preserving co-clustering has never been studied. To fill this gap, we consider a recent co-clustering algorithm, based on a de-normalized version of the Goodman-Kruskal’s τ association measure, which has a good property from a differential privacy perspective, and is supposed not to consume an excessive amount of privacy budget. This leads to a privacy-preserving co-clustering algorithm that satisfies the definition of differential privacy while providing good partitioning solutions. Our algorithm is based on a prototype-based optimization strategy that makes it fast and actionable in realistic privacy-preserving data management and analysis scenarios, as shown by our extensive experimental validation.

Differentially Private Associative Co-clustering

Battaglia, Elena
Co-first
;
Pensa, Ruggero G.
Co-first
2025-01-01

Abstract

Co-clustering is a useful tool that extracts summary information from a data matrix in terms of row and column clusters, and gives a succinct representation of the data. However, if the matrix contains data about individuals, such representations could leak their privacy-sensitive information. In terms of privacy disclosure, co-clustering is even more harmful than clustering, because of the additional information carried by the column partition. However, to the best of our knowledge, the problem of privacy-preserving co-clustering has never been studied. To fill this gap, we consider a recent co-clustering algorithm, based on a de-normalized version of the Goodman-Kruskal’s τ association measure, which has a good property from a differential privacy perspective, and is supposed not to consume an excessive amount of privacy budget. This leads to a privacy-preserving co-clustering algorithm that satisfies the definition of differential privacy while providing good partitioning solutions. Our algorithm is based on a prototype-based optimization strategy that makes it fast and actionable in realistic privacy-preserving data management and analysis scenarios, as shown by our extensive experimental validation.
2025
SIAM International Conference on Data Mining
Alexandria, VA, USA
May 1-3, 2025
Proceedings of the 2025 SIAM International Conference on Data Mining (SDM)
Society for Industrial and Applied Mathematics
233
241
9781611978520
https://epubs.siam.org/doi/10.1137/1.9781611978520.22
clustering, privacy, unsupervised learning, high-dimensional data
Battaglia, Elena; Pensa, Ruggero G.
File in questo prodotto:
File Dimensione Formato  
sdm2025_printed.pdf

Accesso riservato

Descrizione: PDF editoriale
Tipo di file: PDF EDITORIALE
Dimensione 1.74 MB
Formato Adobe PDF
1.74 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
sdm2025_author.pdf

Accesso aperto

Descrizione: PDF autore
Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 1.6 MB
Formato Adobe PDF
1.6 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2071490
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact