What is The Infinitesimal?
Thinking about human genetics in a world where every variant is causal but only a tiny bit.
I am a statistical geneticist and I am writing here about what it means for human traits to be weakly heritable, highly polygenic, nearly neutral, and context specific. I’ve written an introduction to heritability, population genetics, and group differences and I intend to use this newsletter for shorter, more focused discussion of the same themes. My specific aims are:
Share my excitement about the latest research.
At it’s best, the field of human statistical genetics can provide us with causal instruments to understand complex human processes around us. I will talk about new research that I find exciting, with a focus on understanding common complex traits and their evolution.
Focus on well-defined and estimated parameters.
One of the challenges in a world where every phenotype is heritable is that claims like “X is heritable” are no longer informative. Likewise for the claim “X is under selection” when every single phenotype is under some amount of (possibly extremely weak) selection. The information we seek is a precise estimate of the parameter: how heritable? through which paths? how much selection? and when? That, of course, means having accurate definitions of parameters and reliable estimators.
Provide context on work that is easily misinterpreted.
That fact that it is not possible to make simple statement like “X is heritable, Y is not” also enables a lot of misinterpretation. Because humans tend to treat genetic variation with a special status, it also draws a lot of grifters and con artists hoping to use that special status to smuggle in their own moral beliefs and prejudices under the guise of empiricism.
Provide a little bit of commentary on broader academic issues.
I am an academic, and I hope to give a bit of my perspective on academia more generally and the role that academic institutions do/should play in our society.
Not wade too far outside my areas of experience.
But this is not meant to be a jack-of-all-trades culture blog. I am not an expert in most things and I will try hard to stay out of topics where I have nothing novel to add.
Two topics I focus on a lot — behavior and heritability — merit a bit more discussion.
Behavior
The genetics of human behavior is where the limits of our understanding become the most apparent.
The existing discussion is poorly framed.
The debates over the genetics of human behavior tend to fall into dichotomies of “nature vs. nurture”, “genes vs. environment”, “blank slate vs. hereditarian”, “neutral vs. adaptive”, etc. These dichotomies do not make sense when traits are actually driven by tens of thousands of genetic mechanisms that also interact with hundreds of environments. Most of the discussion thus ends up fixating on abstractions that do not accurately reflect the real world and do more to confuse than inform.
A lot of data and few well-defined parameters.
Behavioral phenotypes are often very easy to collect, and so — paradoxically — some of the largest genetic studies we have available or of behavioral phenotypes. The largest association study of educational attainment (something that’s collected on nearly every intake form) is ~3 million; whereas the largest study of breast cancer — the most common cancer in the US — is about one tenth of that. This has resulted in a data/understanding gap: it’s easy to identify impressively significant associations with education, but it’s hard to know what these associations actually mean.
Normal conversations are lacking.
When I write about the challenges in understanding regulatory variation or disease mechanisms, the response is a pleasant chatter of opinions, possible theories, new proposals, etc. Naturally, there are strongly held beliefs — people care about doing things the right way, which typically means their way — but the overall tone is relaxed. When I write about challenges in understanding IQ or twin heritability, the responses are tense, everyone seems very on edge, accusations of bad faith are regularly thrown around, and so on. Some of this is an inside-outside game, but I think a lot of the response is a reflection of fundamentally different cultures in these fields.
Behavioral genetics research demands to be taken seriously.
Given all of these complexities, one might be surprised to find that the behavioral genetics literature tends to slip into broad claims more often than other areas of human/clinical genetics. It is not uncommon to read a paper with an Introduction acknowledging that heritability is not a sufficient parameter for understanding policy and ending with a Discussion that contains many paragraphs of policy-related speculation. The subtext is that genetic analyses have serious implications for policy. But being taken seriously means close and critical reading of what is actually being inferred.
Heritability
The concept of heritability underpins much of what we can and — more importantly — cannot answer with genetic data. Many disputes over genetics often come down to implicit differences over how heritability is defined (and there are many, many definitions), or differences over what it is actually measuring. In the past two decades, large-scale measurements of genetic variation have started providing the data needed to put on meat on the bones of these disputes. This data has also revealed some fundamental contradictions.
The scientific implications of the missing heritability debate are profound.
The field has generally been interested in the gap between twin and molecular heritability estimates (the so-called “missing heritability”) but I suspect even most geneticists do not appreciate just how substantial this gulf is for behavioral phenotypes. Here are a few examples:
Aggressive behavior: 50% heritability from twins, 3% from GWAS
Picky eating: 70% dominance heritability in twins, ~0% from GWAS (for essentially all common traits)
Educational attainment: 40-50% from twins, 4% from GWAS
IQ: 50-80% in adult twins, ~15% from GWAS
These are not small differences: going from 4% to 40% is simply not compatible with any conventional model of genetic architecture. It also fundamentally alters the way we think about these phenotypes. If 50-80% heritability for IQ from twins was described as “highly heritable” then 85% environmentality for IQ from GWAS should certainly be considered “highly environmental”. And yet, the opening sentence of the most recent IQ GWAS is “Intelligence is highly heritable” [Savage et al. 2018 Nat Genet], with a reference to a twin study. But is it?
While it is true that both types of estimates come from models and all models are wrong, the implications of which model is more wrong are profound. There are a few ways this could end:
The twins are right, and common traits are driven by a massive contribution from rare non-coding variants or variants that are not even measured on genotyping platforms. This would have massive implications for our understanding of genetic mechanisms as well as decades of evolutionary theory about frequency/effect distributions.
The twins are wrong because they’ve artificially constrained gene-environment interactions to zero (which then get mis-estimated as pure genetics). This would mean an important and complex role for the shared environment in common traits, whereby genetic variants are routinely exhibiting context specific effects in different families. This would also imply that genetic prediction could potentially be greatly improved by understanding these environmental interactions.
The twins are wrong because phenotypes actually exhibit extensive interactions (e.g. the “limiting pathways” model). This would imply a radically different model of human phenotypes, where many different sub-phenotypes exist in the population and are then aggregated as phenotype clusters that run in families. It would also imply that genetic prediction could be greatly improved by learning these gene-gene interactions.
The twins are wrong because the equal environment assumption is routinely violated and MZ twins are fundamentally different from DZs. This is perhaps the least interesting outcome from the perspective of science, since it is simply a methodological flaw. But it it is fascinating from the perspective of the history of science in that it would undermine a swathe of major findings in twin-based behavioral genetics for over a century, a reality the field will need to adapt to in real time.
The debate over how and how much genetic influence on traits is transmitted across generations goes back to the very start of modern quantitative genetics. Whatever the outcome, how could one not be fascinated watching it play out as genetic data is finally brought to bear?
P.S. Why the name?
The “infinitesimal model”, coined by R.A. Fisher, proposes that human traits are influenced by an infinite number of independent genetic variants each with an infinitely small additive effect, in addition to environmental variation. This fantastical model, developed over a century ago, now appears to be a pretty good approximation of the true genetic architecture of common human traits (of course, humans do not have an infinite number of genes). For more on the history, definition, and implications of the infinitesimal model see Barton et al. 2017.
How confident are you that twin studies are right or wronf