The use of Big Data (BD) for improving the statistics and reducing the costs is a great opportunity and challenge for the National Statistical Offices (NSOs). Often the debate on BD is focused on the IT issues to deal with their volume, velocity, variety. Nevertheless, the NSOs have to be assured that the estimates have a good level of accuracy as well. This paper evaluates when estimators using Internet web scraped variables from a list of enterprise websites, suffering from selectivity concerns, are competitive with respect to a survey sampling estimators. A Monte Carlo simulation using a synthetic population based on real data is implemented to compare predictive estimators based on BD, survey estimators and blended estimators combining predictive and survey estimators.

Quality issues when using Big Data in Official Statistics

Natalia Golini
2017-01-01

Abstract

The use of Big Data (BD) for improving the statistics and reducing the costs is a great opportunity and challenge for the National Statistical Offices (NSOs). Often the debate on BD is focused on the IT issues to deal with their volume, velocity, variety. Nevertheless, the NSOs have to be assured that the estimates have a good level of accuracy as well. This paper evaluates when estimators using Internet web scraped variables from a list of enterprise websites, suffering from selectivity concerns, are competitive with respect to a survey sampling estimators. A Monte Carlo simulation using a synthetic population based on real data is implemented to compare predictive estimators based on BD, survey estimators and blended estimators combining predictive and survey estimators.
2017
Statistics and Data Science: new challenges, new generations.
Florence (Italy)
28-30 June 2017
Proceedings of the Conference of the Italian Statistical Society
Università degli Studi di Firenze
847
854
9788864535210
https://www.fupress.com/archivio/pdf/3407_11724.pdf
Big Data, sampling estimation, selectivity, Big Data quality framework
Paolo Righi, Giulio Barcaroli, Natalia Golini
File in questo prodotto:
File Dimensione Formato  
SIS 2017 Righi et al..pdf

Accesso aperto

Descrizione: Articolo principale
Tipo di file: PDF EDITORIALE
Dimensione 361.48 kB
Formato Adobe PDF
361.48 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1740038
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact