CINECA IRIS Institutional Research Information System

Co-clustering is a useful tool that extracts summary information from a data matrix in terms of row and column clusters, and gives a succinct representation of the data. However, if the matrix contains data about individuals, such representations could leak their privacy-sensitive information. In terms of privacy disclosure, co-clustering is even more harmful than clustering, because of the additional information carried by the column partition. However, to the best of our knowledge, the problem of privacy-preserving co-clustering has never been studied. To fill this gap, we consider a recent co-clustering algorithm, based on a de-normalized version of the Goodman-Kruskal’s τ association measure, which has a good property from a differential privacy perspective, and is supposed not to consume an excessive amount of privacy budget. This leads to a privacy-preserving co-clustering algorithm that satisfies the definition of differential privacy while providing good partitioning solutions. Our algorithm is based on a prototype-based optimization strategy that makes it fast and actionable in realistic privacy-preserving data management and analysis scenarios, as shown by our extensive experimental validation.

Differentially Private Associative Co-clustering

Battaglia, Elena^Co-first;Pensa, Ruggero G.^Co-first

2025-01-01

Abstract

Co-clustering is a useful tool that extracts summary information from a data matrix in terms of row and column clusters, and gives a succinct representation of the data. However, if the matrix contains data about individuals, such representations could leak their privacy-sensitive information. In terms of privacy disclosure, co-clustering is even more harmful than clustering, because of the additional information carried by the column partition. However, to the best of our knowledge, the problem of privacy-preserving co-clustering has never been studied. To fill this gap, we consider a recent co-clustering algorithm, based on a de-normalized version of the Goodman-Kruskal’s τ association measure, which has a good property from a differential privacy perspective, and is supposed not to consume an excessive amount of privacy budget. This leads to a privacy-preserving co-clustering algorithm that satisfies the definition of differential privacy while providing good partitioning solutions. Our algorithm is based on a prototype-based optimization strategy that makes it fast and actionable in realistic privacy-preserving data management and analysis scenarios, as shown by our extensive experimental validation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo dell'evento
	
				SIAM International Conference on Data Mining
			
	Luogo dell'evento
	
				Alexandria, VA, USA
			
	Data dell'evento
	
				May 1-3, 2025
			
	Titolo del volume
	
				Proceedings of the 2025 SIAM International Conference on Data Mining (SDM)
			
	Nome editore
	
				Society for Industrial and Applied Mathematics
			
	Pagine (da)
	
				233
			
	Pagine (a)
	
				241
			
	Codice ISBN
	
				9781611978520
			
	DOI
	
				https://dx.doi.org/10.1137/1.9781611978520.22
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://epubs.siam.org/doi/10.1137/1.9781611978520.22
			
	Parole Chiave
	
				clustering, privacy, unsupervised learning, high-dimensional data
			
	Tutti gli autori
	
						Battaglia, Elena; Pensa, Ruggero G.
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
sdm2025_printed.pdf Accesso riservato Descrizione: PDF editoriale Tipo di file: PDF EDITORIALE Dimensione 1.74 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.74 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
sdm2025_author.pdf Accesso aperto Descrizione: PDF autore Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE) Dimensione 1.6 MB Formato Adobe PDF Visualizza/Apri	1.6 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2071490

Citazioni

ND

0

ND

social impact