No, heritability will not tell you anything about education policy
Or: Variance components are not a substitute for mechanisms
When someone brings up the findings of genetic studies on educational attainment, they typically have one of two goals:
They are interested in the mechanism by which genes modulate cognitive traits, either individually or in aggregate.
They want to vaguely talk about “genetic endowments” or “innate inequalities” to make a broader point about topics like policy, meritocracy, or fairness.
Unfortunately, while the second goal actually has nothing substantive to do with genetics, it has somehow become a dominant stream in the discourse. A recent example is an article by Freddie deBoer on “Education and Genes”. I’ve been a long-time reader of deBoer’s going back to his L’Hote/LoOG days, and I often enjoy his writing and characteristic bluntness. On this topic, however, I think he wrote himself into a corner a few years ago in The Cult of Smart, where he leaned heavily (albeit briefly) on “innate inequalities” to make a broader case against the educational system. This article continues that trend by using a new genetic analysis as a springboard for a broader argument about education which inevitably veers into genetic essentialism. I’m picking on it because I think it’s important to be critical of real examples, but the piece also exhibits themes that are common to the genre.
The genetic mechanisms of education are weak and only getting weaker
A serious of unfortunate events happened over the past decade. Some early genetic studies were published, showing that education/IQ was moderately heritable just like any other biological trait, and arguing that the heritability was only going to increase. A number of writers took these findings as settled science and wrote books espousing them: that modern DNA data has confirmed that genetics is a major direct contributor to inequalities in the classroom. Then bigger and more carefully designed studies were conducted, showing that those earlier findings were largely confounded and misinterpreted. But no one wrote any books about it. deBoer was one of those early writers and continues to go back to the well, using genetics to make policy arguments. He starts his post with some supposedly new research:
A reader points me in the direction of new research (study, writeup) that shows the largest associations yet between polygenic scores and educational outcomes … But certainly you can look at an effort like this latest study and see what many population geneticists have long predicted - increasingly strong statistical associations between educational metrics and known variants as techniques grow more sophisticated.
This is a common argument, that genetics is playing an increasingly stronger role in educational outcomes as the data and methods improve. While it is true that larger studies will tend to find more associations, what we’ve actually learned over the past few years is that most of these associations are not causal “genetic endowments” (a term deBoer repeats in the piece) but are heavily confounded by assortative mating, dynastic/familial environments, and population stratification. This confounding is not some minor detail either, in the most recent GWAS of educational attainment confounding accounted for 2/3rds of the predictive power of the resulting polygenic score. In fact, the researchers who actually work in this area have repeatedly advocated against use of the term “genetic endowments” for this exact reason:
Our finding implies that a substantial part of the predictive power of the polygenic index is due to some mix of assortative mating and gene-environment correlation. For this and other reasons, we believe it is misleading to use phrases such as “innate ability” or “genetic endowments” to describe what is measured by polygenic indexes based on our GWAS estimates. These phrases incorrectly imply that the polygenic index is entirely capturing direct effects, and they further ignore the potentially important role that environmental factors play in mediating direct effects
~ Okbay et al. 2022 FAQ, and also here, here, here, etc.
So what does this specific study (Wilding et al. (2024)) actually show? In fact, they did not construct a new polygenic score, uncover new variants, or use any new sophisticated techniques. They simply conducted a meta-analysis of applications of a polygenic score constructed from the Lee et al. 2018 GWAS of educational attainment, now six years old and one GWAS cycle out of date. In the meta-analysis, they find a correlation of 0.27 between the score and educational attainment, and 0.24 with educational achievement (i.e. grades). This translates to 6-7% of the variance in these educational outcomes explained by a genetic predictor (including potential confounding effects from the environment). The figure below reproduces the meta-analysis and also provides a scatterplot visualization of what an r=0.26 looks like (or you can just close your eyes and imagine a random blob):
These numbers are very much in line with what has been reported for these traits for many years (which is entirely expected, since, as I mentioned, the study did not actually do any new discovery). To the authors’ credit (though it could have come a bit sooner) the final paragraph in the paper describes the issues with environmental confounding that otherwise get ignored:
Our meta-analysis could not model the interplay between genes and the environment, including differentiating polygenic score variance due to gene-environment correlations (rGE), assortative mating, and population stratification (Nivard et al., 2024; Plomin & von Stumm, 2022; Selzam et al., 2019; Wertz et al., 2019). Comparisons of within- and between-family polygenic score predictions have shown genetic effects on educational attainment that are environmentally mediated (i.e., passive rGE; Selzam et al., 2019; Wang et al., 2021). The most recent GWAS of years spent in education found that only ~ 30% of the prediction of educational attainment was due to direct genetic effects, with environmental confounding playing a major role (Okbay et al., 2022). While no studies to date reported comparable findings for educational achievement, polygenic scores should not be interpreted as reflecting direct genetic effects but as predictors that capture genetic and environmental effects (Plomin & von Stumm, 2022).
As a practical point, these findings should be concerning to anyone who is seriously interested in leveraging genetic scores in the classroom. First of all, the association is very weak and is unlikely to get much stronger with larger GWAS (a point I will continue to re-iterate: we already know how good these scores can get by estimating molecular heritability). Second, the substantial environmental confounding means these scores will be largely subsumed by other, more readily available environmental measurements. Indeed, a recent study [Morris et al. (2020)] showed that early grades completely dominate over genetic scores in predicting future academic achievement. Third, the fact that there was significant heterogeneity even within cohorts of European ancestry presents an under-appreciated problem. Parents will probably have some concerns if their child is de-prioritized from advanced classes because of a genetic score that is 4x more accurate for UK children than for US children, doesn’t work at all for Polish children, and hasn’t even been built for individuals with non-European ancestry.
Heritability is not policy
So the “largest associations yet” are a dud, but deBoer goes on to explain the actual reason for bringing up this study:
Personally, I think that this conversation tends to be too fixated on the usefulness of genetic testing of individuals and not sufficiently focused on the big picture - the fact that, if every student does not actually have equal potential, the entire foundation of modern educational philosophy has been utterly destabilized.
In other words, genes are a useful illustrative tool for the broader goal of destabilizing “the entire foundation of modern educational philosophy”. This is a mistake: correlation with genetics tells us exactly nothing about modern educational philosophy. Moreover, this is not even a new mistake. In the late 1970’s, when systematic twin studies started producing estimates of high heritability and low shared environment for economic measurements like earnings, psychologist Hans Eysenck reacted that the finding “really tells the [Royal] Commission [on the Distribution of Income and Wealth] that they might as well pack up”. Eysenck and deBoer differ in their goals — the former wanted to eliminate redistribution and the latter wants to put it on steroids — but their incorrect intuition regarding variance components is the same. Responding to such views, Arthur Goldberger wrote a searing critique of the operationalization of heritability including a sarcastic response to Eysenck:
A powerful intellect was at work. In the same vein, if it were shown that a large proportion of the variance in eyesight were due to genetic causes, then the Royal Commission on the Distribution of Eyeglasses might as well pack up. And if it were shown that most of the variation in rainfall is due to natural causes, then the Royal Commission on the Distribution of Umbrellas could pack up too.
Goldberger makes two important points here: the first, that heritability does not tell you what the influence of genes will be under a change in the environment (glasses); the second, that even if you cannot change the cause (rainfall), heritability still does not tell you whether and how you should take preventative measurements (umbrellas). Unfortunately, these points are routinely forgotten, and deBoer extends this reasoning even further into outright genetic determinism with an analogy to height (another common point of comparison which is not at all analogous to education):
Height is highly polygenic, it’s heavily influenced by environment, there are gene by environment interactions, all true. However, none of this means that height is not significantly heritable, and crucially if your genes don’t want you to be 7 feet tall, you’re not going to be 7 feet tall.
Here again heritability and malleability are conflated even though the two quantities have no relationship. On the one end, the number of fingers you have is not significantly heritable, since any loss of fingers is typically an environmental accident, but that does not mean you can just grow another finger! On the other end, eyesight and obesity are highly heritable (according to some twin studies, BMI is more heritable than height) and yet both are obviously modifiable, the latter rapidly so with recent pharmaceutical advances.
The other problem with this analogy is that it is a comparison between a physical characteristic (height) and an ability (educational attainment). Physical characteristics are the consequence of biological processes that can become fixed in development, like height (heritable) or fingers (not heritable). Abilities are not. An analogous ability to height would be something like reaching things off the high shelf. But the statement “if your genes don’t want you to reach things off the high shelf, you’re not going to reach things off the high shelf” is obviously absurd. In some cases, physical characteristics are so well understood and closely linked to abilities as to be effectively equivalent: being the World’s Tallest Man is an ability that is indistinguishable from a characteristic. But intelligence is not well understood, in fact we have essentially no understanding of the causes of variability in intelligence, we cannot even say if it has a single common cause, samples from thousands of causes, or emerges due to dynamic interactions. We do know quite well that performance on IQ tests can be increased by education, so it is clearly a malleable trait.
This is not to say that genetics can never tell us something about policy. If we identify a genetic variant that directly influences pigment, and we see that carriers of the variant do better in school districts that have bias training — that might give us an insight into policy (and maybe into society). Or if we identify a genetic variant that directly reduces eyesight, and we see that carriers of that variant do better in school districts that have mandatory vision tests. Such analyses are not a replacement for a truly randomized trial, but they can tell us how to effectively design the randomized trial. The key difference here is mechanism: the way we conclude whether a trait is malleable or not is by actually understanding and testing the mechanisms, not by partitioning its variance.
Environmentality is also not policy
When people eventually do concede that heritability is not a sufficient statistic for determining the effectiveness or fairness of a policy, they often switch to making similar arguments about the environmental variance components. A recent trend is to re-estimate twin-based heritability with extended family models, which typically shrink the “shared environment” component for methodological reasons, and then draw sweeping conclusions about the structure of society based on how much the component has shrunk. A particularly egregious example of this is the recent study of Wolfram et al. (an analysis I have discussed in the past), which closes with the following discussion (emphasis mine):
The results presented here suggest that shared environmental influence might account for even less of the variation in educational attainment than conventional twin studies have indicated and that environmental opportunities might therefore be more equal than these studies have implied. Moreover, a large fraction of the remaining shared environmental variation for EA appears to consist of twin-specific shared environments that capture within-family differences in opportunity that carry a different moral and political connotation to between-family differences (even if they remain potential targets for political intervention).
This interpretation is incorrect for essentially mirror reasons as Eysenck and deBoer: the proportion of variance explained by the different environmental components do not tell you anything about the mechanisms of that variance, nor about what would happen to the trait if you changed the environment (or the distribution of environments). To illustrate the point, let’s take the core variance components that come out of a twin model - genetics (A), shared environment (C), and non-shared/idiosyncratic environment (E) - and look at how they could map to different policy/equity interpretations.
Genetics:
A genetic variant increases your working memory capacity. Carriers of the variant tend to remember and recall facts faster and do better on tests.
A genetic variant reddens your hair. In this society, red haired children are discriminated against in school and receive worse grades for the same quality of work.
Shared environment:
Neglectful parents encourage their kids to skip homework for TV, leading to poor test performance.
Neglectful parents ignore their children, leading to trauma in both of their offspring, who then go on to have behavior issues in school and low grades.
Some neighborhoods have poorly performing schools with sub-par teachers, so siblings tend to learn less and do worse on tests.
Non-Shared environment:
Neglectful parents ignore their children, leading to conflict between the offspring for who is the “favorite”, one does well while the other spirals into a depression, skips class, and does poorly on tests.
One sibling does poorly in school due to random natural causes (a concussion from a bike accident).
One sibling does poorly in school due to a bike accident that could have been avoided with better city planning.
Gene-Environment correlation / interaction:
Students who have a genetic predisposition for better working memory tend to seek out and impress better/more demanding teachers, which puts them into better learning environments and further increases their test taking ability, increasing initially small genetic differences.
A good school identifies students with mild vision problems and subsidizes their glasses so they do not fall behind. A low quality school ignores their vision problems and allows them to fall behind and do poorly on tests, exacerbating initially small genetic differences.
In this society, students with red hair are required to attend low quality schools where they learn less and do poorly on tests. Genetic variants influencing hair color become correlated with environmental factors influencing test taking ability.
You get the idea and I am sure readers could come up with even better examples. The point is that we can imagine very many universes, ranging from the highly meritocratic to the completely dystopian, that can produce exactly the same variance components. Even if the numbers being estimated for the A, C, and E variance components are completely free of bias, they still tell us absolutely nothing about the equality of environmental opportunities or the validity of the educational system.
Heritability is not a rhetorical device
One could, of course, make the case for innate inequalities without genetics coming into play at all: kids who are exposed to lead, or hit by a car, or experience severe psychological trauma may also not have equal potential. They certainly do not come in on equal footing, and they likely need different things from the educational system and society at large — needs that some societies accommodate and some do not. Moreover, the mechanisms by which car accidents and childhood trauma influence educational attainment are much better understood than the mechanisms of ~3,500 educational attainment GWAS SNPs. Why not reach for these examples? I suspect the reason is that genetics feels more immutable and so it is seen as carrying more rhetorical force. Bring up lead exposure or unsafe streets and the response you will get is: “okay, so let’s fix those things” (i.e. let’s help people reach things off the high shelf). But bring up “genes”, and you can argue with force that we should either “pack it up” (if you are Eysenck) or “destabilize the entire system” (if you are deBoer).
The irony is that the opposite is true: genetic scores have only revealed how malleable educational outcomes actually are even for the small component that is correlated with genes. In fact, one of the most recent studies of polygenic scores found that, after controlling for your genetic score, the genetic score from your uncle is just as good a predictor of your educational attainment as that of your dad. What was initially thought to be the immutable action of genetics in you, and then revised to be the slightly less immutable action of genetic variation in your parents, actually appears to be largely a consequence of some broader dynastic/familial environment (or just plain old stratification)1. Does this finding tell us that society is deeply meritocratic or deeply unequal? That depends on — you guessed it — the mechanism. But let’s stop raising the vague specter of “genes” as a rhetorical device (especially when we are really talking about “some muddle of genes, environment, and stratification”) and actually do the hard work of understanding mechanisms.
“Our results are consistent with the interpretation of indirect genetic effects on academic achievement as in part or largely due to ‘dynastic effects’. Such effects could reflect subtle socioeconomic and genetic-ancestry stratification co-occurring within homogeneous populations. According to this interpretation, the extended-family-level PGI is correlated with a set of inherited social circumstances that affect children’s academic achievement. An alternative interpretation is that dynastic effects reflect extended-family-level behaviours and investments that contribute to children’s academic achievement. Our results are further consistent with a bias in the population GWAS and PGI estimates introduced by assortative mating. Our analysis cannot isolate the precise mechanisms of indirect genetic effects on EA. However, we can conclude that, for childhood academic achievement in the context of contemporary Norway, the mechanisms that give rise to indirect genetic effects, as indexed by current PGIs, operate mostly beyond the boundaries of nuclear families.” ~ Nivard et al. (2024)
Another great post. Thank you for writing about this.
It’s fascinating that Eysenck and deBoer interpreted the data so similarly but applied it so differently to support their favored positions. There’s a profound message in there about first principles and forking paths or something. And it’s also instructive that deBoer, I believe, is sincerely trying to promote good policy but may be doing so based on false premises. Thanks, Sasha, for making this all a bit more complicated and painful to digest!