Minimal Evidence of Heterogeneity in Effect Estimates: GWAS from the Million Veteran Program
Some interesting cross-ancestry comparisons in one of the most diverse genetic association studies to date.
The Million Veteran Program (MVP) published their flagship Genome-Wide Association Study (GWAS) [Verma et al. 2024 Science], using genotype data from ~600k participants across ~2000 traits . About a third of the cohort were genetically similar to non-European reference populations1 which enabled some novel cross-population comparisons of disease architecture. A few interesting results:
Average SNP-heritability was significantly higher in the AFR group (mean 0.12) than in the EUR group (mean 0.08). This is consistent with more genetic diversity in African populations leading to (a bit) more genetic variance. The Popcorn method [Brown et al. 2016] was applied to estimate genetic correlation across ancestry groups, but this approach cannot properly model admixture2 and, well, all of these populations are admixed. We are still in need of methods like those employed in [Hou et al. 2023] that can be applied to summary-level GWAS data.
In total 26,049 locus-trait associations were identified, of which 3,477 would not have been genome-wide significant without the inclusion of non-EUR individuals, and 834 were primarily driven by non-EUR individuals (i.e. had p>0.05 in the EUR population). Thus ~3% of the associations had evidence of substantial population-specificity. As the authors noted: “The vast majority of differences observed were largely attributable to variations in allele frequency or the presence of genetic variants in one group that were not detectable in other groups”
Focusing on loci the could be fine-mapped in AFR and EUR with high confidence (that is, for which it was possible to statistically prioritize the likely causal variant), only 16/1888 (<1%) exhibited heterogenous associations. Again the conclusion was: “minimal evidence of heterogeneity in effect estimates”. However, fine-mapping was more efficient per sample in the AFR group, likely due to shorter haplotype blocks / lower variant correlation in African populations (though this is somewhat complicated by admixture).
An example of an association that did exhibit heterogeneity was a variant near APOE, a known risk gene for Alzheimer’s/dementia, which had an Odds Ratio of 1.25 for dementia in the AFR group compared to a moderately (but significantly) higher Odds Ratio 1.51 in the EUR group.
A handful of genes exhibited significant differences in pleiotropy between the EUR and AFR populations, meaning they were associated with many more traits in one population than the other. These were lead by the genes APOL1, HBB, and CD36 which have well-established associations with malaria resistance and likely conferred a survival advantage.
In short, while population diversity increased the ability to identify and refine genetic associations, there were “overwhelmingly more similarities than differences in the genetic associations between groups”. This study contributes to the growing evidence that apparent cross-population differences in genetic effects may be largely explained by differences in allele frequencies or linkage disequilibrium with “tagging” variants rather than difference in the underlying causal effect sizes3.
Reference populations from the 1000 Genomes project were used to cluster individuals into ~120k “African (AFR)”, ~60k “Admixed American (AMR)”, ~7k “East Asian (EAS)”, and ~450k “European (EUR)”. In this and other data from the United States, “African” typically consists of admixed African Americans and “Admixed American” typically consists of admixed Hispanics — both of which have recent mixtures from multiple populations including a substantial European contribution. These groups are then sliced into African/European or American/European based on ad hoc ancestry cutoffs, so the above numbers are somewhat ill-defined. It is probably impossible to satisfy everyone with these group representations but I would like to see future work and better methods for summarizing continuous ancestry relationships.
“admixed populations induce very long-range LD that is not accounted for in our approach, and we are therefore limited to unadmixed populations” ~ Brown et al.
See also: Wang et al. “Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations”, Nat Comms 2020; Hou et al. “Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals”, Nat Genet 2023; Taylor et al. “Sources of gene expression variation in a globally diverse human cohort”, bioRxiv 2023