One important confounder in genome-wide association studies (GWASs) is population genetic structure, which may generate spurious associations if not properly accounted for. This may ultimately result in a biased polygenic risk score (PRS) prediction, especially when applied to another population. To explore this matter, we focused on principal component analysis (PCA) and asked whether a population genetics informed strategy focused on PCs derived from an external reference population helps in mitigating this PRS transferability issue. Throughout the study, we used two complex model traits, height and body mass index, and samples from UK and Estonian Biobanks. We aimed to investigate 1) whether using a reference population (1000G) for computation of the PCs adjusted for in the discovery cohort improves the resulting PRS performance in a target set from another population and 2) whether adjusting the validation model for PCs is required at all. Our results showed that any other set of PCs performed worse than the one computed on samples from the same population as the discovery dataset. Furthermore, we show that PC correction in GWAS cannot prevent residual population structure information in the PRS, also for non-structured traits. Therefore, we confirm the utility of PC correction in the validation model when the investigated trait shows an actual correlation with population genetic structure, to account for the residual confounding effect when evaluating the predictive value of PRS.

A Principal Component Informed Approach to Address Polygenic Risk Score Transferability Across European Cohorts

Marnetto, Davide;
2022-01-01

Abstract

One important confounder in genome-wide association studies (GWASs) is population genetic structure, which may generate spurious associations if not properly accounted for. This may ultimately result in a biased polygenic risk score (PRS) prediction, especially when applied to another population. To explore this matter, we focused on principal component analysis (PCA) and asked whether a population genetics informed strategy focused on PCs derived from an external reference population helps in mitigating this PRS transferability issue. Throughout the study, we used two complex model traits, height and body mass index, and samples from UK and Estonian Biobanks. We aimed to investigate 1) whether using a reference population (1000G) for computation of the PCs adjusted for in the discovery cohort improves the resulting PRS performance in a target set from another population and 2) whether adjusting the validation model for PCs is required at all. Our results showed that any other set of PCs performed worse than the one computed on samples from the same population as the discovery dataset. Furthermore, we show that PC correction in GWAS cannot prevent residual population structure information in the PRS, also for non-structured traits. Therefore, we confirm the utility of PC correction in the validation model when the investigated trait shows an actual correlation with population genetic structure, to account for the residual confounding effect when evaluating the predictive value of PRS.
2022
13
1
10
genome-wide association study; polygenic risk score; population structure; principal component analysis; transferability
Pärna, Katri; Nolte, Ilja M; Snieder, Harold; Fischer, Krista; Estonian Biobank Research Team; Marnetto, Davide; Pagani, Luca
File in questo prodotto:
File Dimensione Formato  
fgene-13-899523.pdf

Accesso aperto

Dimensione 1.39 MB
Formato Adobe PDF
1.39 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1880080
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact