In recent years, the growing availability of huge amounts of information, generated in every sector at high speed and in a wide variety of forms and formats, is unprecedented. The ability to harness big data is an opportunity to obtain more accurate analyses and to improve decision-making in industry, government and many other organizations. However, handling big data may be challenging and proper data integration is a key dimension in achieving high information quality. In this paper, we propose a novel approach to data integration that calibrates online generated big data with interview based customer survey data. A common issue of customer surveys is that responses are often overly positive, making it difficult to identify areas of weaknesses in organizations. On the other hand, online reviews are often overly negative, hampering an accurate evaluation of areas of excellence. The proposed methodology calibrates the levels of unbalanced responses in different data sources via resampling and performs data integration using Bayesian Networks to propagate the new re-balanced information. In this paper we show, with a case study example, how the novel data integration approach allows businesses and organizations to get a bias corrected appraisal of the level of satisfaction of their customers. The application is based on the integration of online data of review blogs and customer satisfaction surveys from the San Francisco airport. We illustrate how this integration enhances the information quality of the data analytic work in four of InfoQ dimensions, namely, Data Structure, Data Integration, Temporal Relevance and Chronology of Data and Goal.

Social Media Big Data Integration: a New Approach Based on Calibration

Dalla Valle L
;
2018-01-01

Abstract

In recent years, the growing availability of huge amounts of information, generated in every sector at high speed and in a wide variety of forms and formats, is unprecedented. The ability to harness big data is an opportunity to obtain more accurate analyses and to improve decision-making in industry, government and many other organizations. However, handling big data may be challenging and proper data integration is a key dimension in achieving high information quality. In this paper, we propose a novel approach to data integration that calibrates online generated big data with interview based customer survey data. A common issue of customer surveys is that responses are often overly positive, making it difficult to identify areas of weaknesses in organizations. On the other hand, online reviews are often overly negative, hampering an accurate evaluation of areas of excellence. The proposed methodology calibrates the levels of unbalanced responses in different data sources via resampling and performs data integration using Bayesian Networks to propagate the new re-balanced information. In this paper we show, with a case study example, how the novel data integration approach allows businesses and organizations to get a bias corrected appraisal of the level of satisfaction of their customers. The application is based on the integration of online data of review blogs and customer satisfaction surveys from the San Francisco airport. We illustrate how this integration enhances the information quality of the data analytic work in four of InfoQ dimensions, namely, Data Structure, Data Integration, Temporal Relevance and Chronology of Data and Goal.
2018
111
76
90
https://www.sciencedirect.com/science/article/pii/S0957417417308667
Bayesian networks; Data integration; Information quality (InfoQ)
Dalla Valle L; Kenett R
File in questo prodotto:
File Dimensione Formato  
final_published.pdf

Accesso riservato

Dimensione 3.3 MB
Formato Adobe PDF
3.3 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2025106
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact