Background: Healthcare-associated infections (HAIs) represent a major Public Health issue. Hospital-based prevalence studies are a common tool of HAI surveillance, but data quality problems and non-representativeness can undermine their reliability. Methods: This study proposes three algorithms that, given a convenience sample and variables relevant for the outcome of the study, select a subsample with specific distributional characteristics, boosting either representativeness (Probability and Distance procedures) or risk factors’ balance (Uniformity procedure). A “Quality Score” (QS) was also developed to grade sampled units according to data completeness and reliability. The methodologies were evaluated through bootstrapping on a convenience sample of 135 hospitals collected during the 2016 Italian Point Prevalence Survey (PPS) on HAIs. Results: The QS highlighted wide variations in data quality among hospitals (median QS 52.9 points, range 7.98–628, lower meaning better quality), with most problems ascribable to ward and hospital-related data reporting. Both Distance and Probability procedures produced subsamples with lower distributional bias (Log-likelihood score increased from 7.3 to 29 points). The Uniformity procedure increased the homogeneity of the sample characteristics (e.g., − 58.4% in geographical variability). The procedures selected hospitals with higher data quality, especially the Probability procedure (lower QS in 100% of bootstrap simulations). The Distance procedure produced lower HAI prevalence estimates (6.98% compared to 7.44% in the convenience sample), more in line with the European median. Conclusions: The QS and the subsampling procedures proposed in this study could represent effective tools to improve the quality of prevalence studies, decreasing the biases that can arise due to non-probabilistic sample collection.

Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies

D'Ambrosio A.
First
;
Garlasco J.;Quattrocolo F.;Vicentini C.;Zotti C. M.
Last
2021-01-01

Abstract

Background: Healthcare-associated infections (HAIs) represent a major Public Health issue. Hospital-based prevalence studies are a common tool of HAI surveillance, but data quality problems and non-representativeness can undermine their reliability. Methods: This study proposes three algorithms that, given a convenience sample and variables relevant for the outcome of the study, select a subsample with specific distributional characteristics, boosting either representativeness (Probability and Distance procedures) or risk factors’ balance (Uniformity procedure). A “Quality Score” (QS) was also developed to grade sampled units according to data completeness and reliability. The methodologies were evaluated through bootstrapping on a convenience sample of 135 hospitals collected during the 2016 Italian Point Prevalence Survey (PPS) on HAIs. Results: The QS highlighted wide variations in data quality among hospitals (median QS 52.9 points, range 7.98–628, lower meaning better quality), with most problems ascribable to ward and hospital-related data reporting. Both Distance and Probability procedures produced subsamples with lower distributional bias (Log-likelihood score increased from 7.3 to 29 points). The Uniformity procedure increased the homogeneity of the sample characteristics (e.g., − 58.4% in geographical variability). The procedures selected hospitals with higher data quality, especially the Probability procedure (lower QS in 100% of bootstrap simulations). The Distance procedure produced lower HAI prevalence estimates (6.98% compared to 7.44% in the convenience sample), more in line with the European median. Conclusions: The QS and the subsampling procedures proposed in this study could represent effective tools to improve the quality of prevalence studies, decreasing the biases that can arise due to non-probabilistic sample collection.
2021
21
1
90
104
Bias correction; Data quality; Healthcare associated infections; Methodology; Prevalence studies; Sampling
D'Ambrosio A.; Garlasco J.; Quattrocolo F.; Vicentini C.; Zotti C.M.
File in questo prodotto:
File Dimensione Formato  
s12874-021-01277-y.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 977.19 kB
Formato Adobe PDF
977.19 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1788967
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact