A major concern in releasing microdata sets is protecting the privacy of individuals in the sample. Consider a data set in the form of a high-dimensional contingency table. If an individual belongs to a cell with small frequency, an intruder with certain knowledge about the individual may identify him and learn sensitive information about him in the data. To estimate the risk of such breach of confidentiality we introduce several nonparametric models which represent progressive extensions of the one adopted by Skinner and Holmes (1998). The latter is a Poisson model with rates modeled through a mixed effects log-linear model with normal random effects. In the first extension, we assume Dirichlet process random effects and, mimicking Skinner and Holmes (1998), we keep the fixed effects constant. Next, we relax the latter assumption and consider a model all effects of which are unknown. In both extended models the total mass parameter of the Dirichlet process is also unknown. The MCMC methods used for inference are extensively discussed. An application to real data concludes the article.

Disclosure risk estimation via nonparametric log-linear models

CAROTA, Cinzia;LEOMBRUNI, ROBERTO;
2012

Abstract

A major concern in releasing microdata sets is protecting the privacy of individuals in the sample. Consider a data set in the form of a high-dimensional contingency table. If an individual belongs to a cell with small frequency, an intruder with certain knowledge about the individual may identify him and learn sensitive information about him in the data. To estimate the risk of such breach of confidentiality we introduce several nonparametric models which represent progressive extensions of the one adopted by Skinner and Holmes (1998). The latter is a Poisson model with rates modeled through a mixed effects log-linear model with normal random effects. In the first extension, we assume Dirichlet process random effects and, mimicking Skinner and Holmes (1998), we keep the fixed effects constant. Next, we relax the latter assumption and consider a model all effects of which are unknown. In both extended models the total mass parameter of the Dirichlet process is also unknown. The MCMC methods used for inference are extensively discussed. An application to real data concludes the article.
XLVI Scientific Meeting of the Italian Statistical Societyd
Rome
20-22 giugno
Proceedings of the XLVI Scientific Meeting of the Italian Statistical Society
CLEUP
1
8
9788861298828
Bayesian log-linear models; cofidentiality; disclosure risk; Dirichlet process; mixed effects models; nonparametric models
C.Carota; M. Filippone; R. Leombruni; S. Polettini
File in questo prodotto:
File Dimensione Formato  
RS12-Disclosure-risk-estimation-via-nonparametric.pdf

Accesso riservato

Tipo di file: PDF EDITORIALE
Dimensione 104.63 kB
Formato Adobe PDF
104.63 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2318/122033
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact