A major concern in releasing microdata sets is protecting the privacy of individuals in the sample. Consider a data set in the form of a high-dimensional contingency table. If an individual belongs to a cell with small frequency, an intruder with certain knowledge about the individual may identify him and learn sensitive information about him in the data. To estimate the risk of such breach of confidentiality we introduce several nonparametric models which represent progressive extensions of the one adopted by Skinner and Holmes (1998). The latter is a Poisson model with rates modeled through a mixed effects log-linear model with normal random effects. In the first extension, we assume Dirichlet process random effects and, mimicking Skinner and Holmes (1998), we keep the fixed effects constant. Next, we relax the latter assumption and consider a model all effects of which are unknown. In both extended models the total mass parameter of the Dirichlet process is also unknown. The MCMC methods used for inference are extensively discussed. An application to real data concludes the article.
Disclosure risk estimation via nonparametric log-linear models
CAROTA, Cinzia;LEOMBRUNI, ROBERTO;
2012-01-01
Abstract
A major concern in releasing microdata sets is protecting the privacy of individuals in the sample. Consider a data set in the form of a high-dimensional contingency table. If an individual belongs to a cell with small frequency, an intruder with certain knowledge about the individual may identify him and learn sensitive information about him in the data. To estimate the risk of such breach of confidentiality we introduce several nonparametric models which represent progressive extensions of the one adopted by Skinner and Holmes (1998). The latter is a Poisson model with rates modeled through a mixed effects log-linear model with normal random effects. In the first extension, we assume Dirichlet process random effects and, mimicking Skinner and Holmes (1998), we keep the fixed effects constant. Next, we relax the latter assumption and consider a model all effects of which are unknown. In both extended models the total mass parameter of the Dirichlet process is also unknown. The MCMC methods used for inference are extensively discussed. An application to real data concludes the article.File | Dimensione | Formato | |
---|---|---|---|
RS12-Disclosure-risk-estimation-via-nonparametric.pdf
Accesso riservato
Tipo di file:
PDF EDITORIALE
Dimensione
104.63 kB
Formato
Adobe PDF
|
104.63 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.