This work is about integrated analysis of data collected as official statistics with administrative data from operational systems in order to increase the quality of information. Information quality, or InfoQ, is ‘the potential of a data set to achieve a specific goal by using a given empirical analysis method’. InfoQ is based on the identification of four interacting components: the analysis goal, the data, the data analysis and the utility, and it is assessed through eight dimensions: data resolution, data structure, data integration, temporal relevance, generalizability, chronology of data and goal, construct operationalization and communication. The paper illustrates, through case studies, a novel strategy to increase InfoQ based on the integration of official statistics with administrative data using copulas and Bayesian Networks. Official statistics are extraordinary sources of information. However, because of temporal relevance and chronology of data and goals, these fundamental sources of information are often not properly leveraged resulting in a poor level of InfoQ in the use of official statistics. This leads to low valued statistical analyses and to the lack of sufficiently informative results. By improving temporal relevance and chronology of data and goals, the use of Bayesian Networks allows us to calibrate official with administrative data, thus strengthening the quality of the information derived from official surveys, and, overall, enhancing InfoQ. We show, with examples, how to design and implement such a calibration strategy.

Official Statistics Data Integration for Enhanced Information Quality

Dalla Valle L
;
2015-01-01

Abstract

This work is about integrated analysis of data collected as official statistics with administrative data from operational systems in order to increase the quality of information. Information quality, or InfoQ, is ‘the potential of a data set to achieve a specific goal by using a given empirical analysis method’. InfoQ is based on the identification of four interacting components: the analysis goal, the data, the data analysis and the utility, and it is assessed through eight dimensions: data resolution, data structure, data integration, temporal relevance, generalizability, chronology of data and goal, construct operationalization and communication. The paper illustrates, through case studies, a novel strategy to increase InfoQ based on the integration of official statistics with administrative data using copulas and Bayesian Networks. Official statistics are extraordinary sources of information. However, because of temporal relevance and chronology of data and goals, these fundamental sources of information are often not properly leveraged resulting in a poor level of InfoQ in the use of official statistics. This leads to low valued statistical analyses and to the lack of sufficiently informative results. By improving temporal relevance and chronology of data and goals, the use of Bayesian Networks allows us to calibrate official with administrative data, thus strengthening the quality of the information derived from official surveys, and, overall, enhancing InfoQ. We show, with examples, how to design and implement such a calibration strategy.
2015
31
7
1281
1300
https://onlinelibrary.wiley.com/doi/abs/10.1002/qre.1859
information quality (InfoQ); data integration; Bayesian networks
Dalla Valle L; Kenett R
File in questo prodotto:
File Dimensione Formato  
10.1002_qre.1859_pub_online.pdf

Accesso riservato

Dimensione 3.07 MB
Formato Adobe PDF
3.07 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2025110
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact