Distance-based machine learning methods have limited applicability to categorical data, since they do not capture the complexity of the relationships among different values of a categorical attribute. Nonetheless, categorical attributes are common in many application scenarios, including clinical and health records, census and survey data. Although distance learning algorithms exist for categorical data, they may disclose private information about individual records if applied to a secret dataset. To address this problem, we introduce a differentially private algorithm for learning distances between any pair of values of a categorical attribute according to the way they are co-distributed with the values of other categorical attributes forming the so-called context. We show empirically that our approach consumes little privacy budget while providing accurate distances

DP-DILCA: Learning Differentially Private Context-based Distances for Categorical Data (Discussion Paper)

Elena Battaglia
First
;
Ruggero G. Pensa
Last
2021-01-01

Abstract

Distance-based machine learning methods have limited applicability to categorical data, since they do not capture the complexity of the relationships among different values of a categorical attribute. Nonetheless, categorical attributes are common in many application scenarios, including clinical and health records, census and survey data. Although distance learning algorithms exist for categorical data, they may disclose private information about individual records if applied to a secret dataset. To address this problem, we introduce a differentially private algorithm for learning distances between any pair of values of a categorical attribute according to the way they are co-distributed with the values of other categorical attributes forming the so-called context. We show empirically that our approach consumes little privacy budget while providing accurate distances
2021
29th Italian Symposium on Advanced Database Systems (SEBD 2021)
Pizzo Calabro (VV), Italy
September 5-9, 2021
Proceedings of the 29th Italian Symposium on Advanced Database Systems (SEBD 2021)
CEUR-WS.org
2994
482
489
http://ceur-ws.org/Vol-2994/paper55.pdf
differential privacy, metric learning, categorical attributes, distance-based methods
Elena Battaglia; Ruggero G. Pensa
File in questo prodotto:
File Dimensione Formato  
sebd2021_dpdilca_open.pdf

Accesso aperto

Descrizione: PDF online (open access)
Tipo di file: PDF EDITORIALE
Dimensione 997.39 kB
Formato Adobe PDF
997.39 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1815412
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact