Do you have links to any studies that support the validity of polygenic liability threshold models? I found a couple of studies up on Google Scholar where the authors claimed a PLT model couldn't explain the data (a paper on sex differences in autism rates, and a paper on heart disease, CVD, and Type II diabetes in Korean populations). In the pro-PLT model camp, I found a meta-study that implied that PLT model worked for the observed data for 8 psychiatric disorders.
How controversial is this model? I'm inclined to believe it, though, but I'd like to see more pro-PLT model studies to fully buy into it.
Thanks! I would say this model is the default for common dichotomous traits and is very widely used. That said, the extent to which it is fully supported by the data is hard to say because we never get to see the underlying liability distribution. Many studies have shown that rare + common variants seem to operate additively (see Veera Rajagopal summarizing many such studies: https://x.com/search?q=%22liability+threshold%22+from%3Adoctorveera&src=typed_query) so I think additivity and at least approximate normality is fairly well established. It has also been used as the foundation for various association methods and shown to improve statistical power across a variety of traits (see Jurgens et al https://www.nature.com/articles/s41588-023-01342-w , and Hujoel et al. https://www.nature.com/articles/s41588-020-0613-6) which also supports approximate normality. Probably the biggest open question is whether the instantaneous threshold is appropriate, and I think there's a lot of evidence showing it is not, at least for behavioral phenotypes that are often diagnosed by adding up a bunch of symptoms: https://theinfinitesimal.substack.com/i/146381322/do-neuropsychiatric-traits-follow-a-spectrum-model . So I would be in the PLT camp but with the understanding that there are likely multiple thresholds and perhaps not perfect normality.
Why a hard threshold and not a soft one, say, a sigmoid relationship between liability and binary outcome (with a hard threshold being a limit case)? I vaguely recall logistic regression being the maximum entropy form of a binary classifier, I’m wondering if there’s any sense in which the linear part of that corresponds to the liability
Yes, this is a totally plausible model sometimes parameterized with a probit link (see slides 29 here: https://cnsgenomics.com/data/teaching/SISG/module_10/Mod10_Session5Naomi/2017_SISG_10_5.pdf). The preference for the simpler liability threshold model is that it makes the math very easy, seems to fit the data reasonably well, and requires the fewest parameters.
Not really useful comments, but, I get the intuitive impression the rarer a disease is the more genes are involved in polygenic disorders. I get the feeling there is a sort of diffusion of "liability" distributed among several, perhaps hundreds of genes with different prevalences in the population, and different probabilities by their lonesome to increase the probability of having a disease, making estimations of likelihood from several of the sources in the comments unreliable. Two sources I really have no idea what they are!. I just picture the interactions changing probabilities in uncanny ways, uugggh!.
Like the estimations are somehow fitted on the data, instead of the data making the estimation. And with a lot of simplifications. Good thing I never considered being a Clinical Geneticist...
And it might seem esoteric stated as such, but although I am not sure, I am confident if we were to add up all diseases in a closed population, the estimates would lead us to conclude all members of said closed population should not be alive!, let alone healthy, from those sources of "estimation" data.
And if pedigree is the weakest it can point into my strong confidence: real vast world data won´t match the predictions.
Just 1 in 200 having two Genetic Diseases really does not reflect the World as I see it, and it comes from the WHO...
Which makes me question the reliability of so many mutations used for diagnosis described in many genetic disorders: dozens, hundreds of mutation for a given condition. Some, or many without a causal mechanism, well enough established of course.
And molecular mechanism are subject to all the problems for them to be published and known, so, double...
But, it seems I am missing a view of how these things move from monogenic diseases, to bigenic, trigenic, tetragenic, etc...
It could be illuminating. And it can start a fad: Oh, I have a quadgene disease...
Then something missing from the comments is somewhat of an understatement: the environment.
Having an environmentally independent threshold does really not work for the most common polygenic diseases: Hypertension, Diabetes, Obesity, Hypercholesterolemia, Coronary Artery Disease, Chronic Pulmonary Obstructive Disease and Stroke.
Let alone Cancer... and Occupational Diseases...
All of those are predominantly Modern Living diseases, even if Atherosclerosis seems to be there in other not that "modern" societies. But surely, limb amputations for chronic arterial insufficiency must be rare in those places: just the exercise, which is the best treatment for it...
I think such is an important point: for the most common polygenic diseases, the environment is determinant.
or am I wrong on those? :)
Then the really bad issue is Psychiatric Diseases: that´s hocus pocus. Those are not Diseases, there is not Neuropathology to them, and estimates of heritability and prevalence/incidence in the Population are just crazy: Some places have over 50% of depressed women.
Some have 30% of kids with Autism, etc.
And shifting from the steadiest of them all in the past: Schizophrenia, to Bipolar disorder, now with a 1% Prevalence is nuts. Bipolar disease used to be 1 in 1,000 to 1 in 10,000 disease, and it took decades of depression turning into a first maniac episode to be diagnosed. It was called Bipolar for a reason, and cycling took years, and most people really did not have many maniac florid episodes.
As such, either it is not the same disease, or a lot of the now called diseased are not sick at all.
Because all Diseases models needs to be History Resistant: they need to explain the past too.
In this case yes since I'm just walking through hypothetical trait models. In general I think you have to triangulate across different estimators of heritability -- pedigree (weakest), molecular, within-family molecular (strongest) -- as well as look at the underlying mechanisms that are identified.
What I would accept as not genetically driven, or what you would accept? Native language, use of chop sticks, where you were born, first digit of social security number. I’m sure you can think of a few.
I would be inclined to ask if something like fetal alcohol syndrome or TORCH diseases, does not have meaningful heritability (at least behaviour genetics, or any perhaps). If it does then somethings wrong surely.
"The liability threshold model also has implications for the techno-futurist (or perhaps techno-dystopian) theories about the elimination of genetic diseases through selective breeding."
I read a lot of techno-futurists and they are far more likely to talk about things like embryo selection than selective breeding. But when it comes to embryo selection, polygenic and monogenic diseases are similar, as long as the heritabilities of the PGSs are similar.
It is puzzling to me that you would object to techno-futurists on the basis of selective breeding without mentioning that this doesn't apply to polygenic selection. And this is far from the first time you've done this thing. It gives the impression that your mention of the policy implications of the theory does not derive from genuine engagement with the discourse in the area where you try to make the policies work, but rather that your writing derives from trying to collect as many objections to this policy area as possible (perhaps indirectly by abstracting the discourse of other geneticists who are trying to collect as many objections as possible).
Eh, I'm pretty sure you're overthinking it. The specific techno-futurism I was thinking about were the frequent claims that crime can be "weeded out" with eugenics (I cited a recent example - https://x.com/SashaGusevPosts/status/1872352725188657640 - on twitter but thought it was a distraction to include here). The reason I didn't talk about embryo selection was because (1) the mechanisms are largely unrelated, as you note, and the conflation with natural selection mostly confuses people and (2) I already wrote extensively about embryo selection (https://theinfinitesimal.substack.com/p/science-fictions-are-outpacing-science) including the underlying genetic models, expected gains, and caveats. I also did not talk about polygenic gene editing, another issue on which techno-futurists have made very bold and nonsensical claims. There's only so much time.
For what its worth, I'm personally optimistic about embryo selection as a modestly accurate disease screen and (more importantly) a tool to help disease-affected parents who are otherwise reluctant to have children. The concerns I expressed in the previous article on the topic are that (1) contemporary embryo screening companies are misleading their customers about the yield and reliability of their products and (2) the techno-futurist obsession with selecting on IQ is completely unmoored from what the data actually tells us is possible. I think if you actually care about embryo selection for health and not as a fun content generator for blogposts, then you should be pretty furious with the startups in this space making misleading claims and the tech pundits pulling the conversation away from broadly supported use cases in, say, cancer and rare disease and towards polarizing and ineffective use cases like IQ.
I would be pretty interested to read your opinion about polygenic gene editing, maybe even a response to Gwern's article, which is the go-to blogpost on this topic for a lot of people I reckon.
There is a part where he writes about why GWAS' are overall good (you will agree with him on this part I believe) and why critics of "missing heritability" thought where wrong during the 2000's (all the people who were saying GWAS was just false-positives) The article might be a little outdated in the sense that it was last updated in early 2020, and I think that's a little too old in your circles :) but I think it's pretty influential
Ah gotcha, yes I've read this. I largely agree with his section on Gene Editing: IQ (and most traits) are highly polygenic, inducing that many edits into a cell is not possible, and rare variants of large positive effect on IQ are unlikely to exist.
1. The estimates for heritability of IQ meta-analyzed by Gwern turn out to have been inflated by sloppy use of methods in the early studies and confounding from indirect effects.
2. The current within-family estimate is much lower (in the 0.12-0.19 range depending on how you estimate it) so the selection yield will be low even with an optimal polygenic score. Estimates of confounding also show that the population-level polygenic score is largely driven by population stratification not real signal.
3. The within-family genetic correlation between IQ and other positive traits also appears to be lower so you will not get much benefit on other traits and may in fact enrich for correlated traits like autism.
4. The negative direct/indirect effect correlation means that either the estimates are confounded by ascertainment in ways we do not understand or by selecting for embryos with positive direct effects on IQ you will also select for negative genetic nurture on IQ and be wiping out some of your signal in either the first or second generation.
I do find it a bit funny that at the time Gwern's post was written the GWAS/GCTA heritability for IQ was thought be quite high and so the gene editing section contains the completely reasonable expectations that rare variants will not contribute much and are unlikely to have positive effects. Now that it turns out the GWAS/GCTA heritability is actually much lower people have simply started claiming that rare variants will fill the gap without ever explaining why these prior expectations were incorrect.
I basically reject the whole "then you should be pretty furious with the startups in this space making misleading claims" point because to some extent I believe it's people's own responsibility to filter out exaggerations from companies (especially investor's own responsibility, in fact their main job), since anything else would seem to require near-totalitarian suppression of disagreement.
I interpret your claim that polygenic selection on IQ is "ineffective" to mean that it's mean treatment effect is 0 or negative, rather than giving 1 or more IQ points on average. Your claim here is as far as I can tell false. Why do you make so many false claims?
(And if I'm doing my variance arithmetic right, I think including IQ in a collection of many variables to be selected on will allow one to gain more value than could be gained by just picking one of the variables?)
As for IQ selection being "polarizing", that seems like an implicit political threat veiled to sound like it comes from an apolitical authority. Like, it's true that angry progressives might start attacking geneticists as that gets popular, but why not think "help, progressives are insane and attacking me" instead of thinking "IQ selection is polarizing"?
For reference, the primary people I've been reading on embryo selection and similar topics are Gwern and Scott Alexander, e.g. https://gwern.net/embryo-selection . I don't have the impression that you are doing them justice.
Fascinating article!
Do you have links to any studies that support the validity of polygenic liability threshold models? I found a couple of studies up on Google Scholar where the authors claimed a PLT model couldn't explain the data (a paper on sex differences in autism rates, and a paper on heart disease, CVD, and Type II diabetes in Korean populations). In the pro-PLT model camp, I found a meta-study that implied that PLT model worked for the observed data for 8 psychiatric disorders.
How controversial is this model? I'm inclined to believe it, though, but I'd like to see more pro-PLT model studies to fully buy into it.
Thanks! I would say this model is the default for common dichotomous traits and is very widely used. That said, the extent to which it is fully supported by the data is hard to say because we never get to see the underlying liability distribution. Many studies have shown that rare + common variants seem to operate additively (see Veera Rajagopal summarizing many such studies: https://x.com/search?q=%22liability+threshold%22+from%3Adoctorveera&src=typed_query) so I think additivity and at least approximate normality is fairly well established. It has also been used as the foundation for various association methods and shown to improve statistical power across a variety of traits (see Jurgens et al https://www.nature.com/articles/s41588-023-01342-w , and Hujoel et al. https://www.nature.com/articles/s41588-020-0613-6) which also supports approximate normality. Probably the biggest open question is whether the instantaneous threshold is appropriate, and I think there's a lot of evidence showing it is not, at least for behavioral phenotypes that are often diagnosed by adding up a bunch of symptoms: https://theinfinitesimal.substack.com/i/146381322/do-neuropsychiatric-traits-follow-a-spectrum-model . So I would be in the PLT camp but with the understanding that there are likely multiple thresholds and perhaps not perfect normality.
Why a hard threshold and not a soft one, say, a sigmoid relationship between liability and binary outcome (with a hard threshold being a limit case)? I vaguely recall logistic regression being the maximum entropy form of a binary classifier, I’m wondering if there’s any sense in which the linear part of that corresponds to the liability
Yes, this is a totally plausible model sometimes parameterized with a probit link (see slides 29 here: https://cnsgenomics.com/data/teaching/SISG/module_10/Mod10_Session5Naomi/2017_SISG_10_5.pdf). The preference for the simpler liability threshold model is that it makes the math very easy, seems to fit the data reasonably well, and requires the fewest parameters.
Not really useful comments, but, I get the intuitive impression the rarer a disease is the more genes are involved in polygenic disorders. I get the feeling there is a sort of diffusion of "liability" distributed among several, perhaps hundreds of genes with different prevalences in the population, and different probabilities by their lonesome to increase the probability of having a disease, making estimations of likelihood from several of the sources in the comments unreliable. Two sources I really have no idea what they are!. I just picture the interactions changing probabilities in uncanny ways, uugggh!.
Like the estimations are somehow fitted on the data, instead of the data making the estimation. And with a lot of simplifications. Good thing I never considered being a Clinical Geneticist...
And it might seem esoteric stated as such, but although I am not sure, I am confident if we were to add up all diseases in a closed population, the estimates would lead us to conclude all members of said closed population should not be alive!, let alone healthy, from those sources of "estimation" data.
And if pedigree is the weakest it can point into my strong confidence: real vast world data won´t match the predictions.
Just 1 in 200 having two Genetic Diseases really does not reflect the World as I see it, and it comes from the WHO...
Which makes me question the reliability of so many mutations used for diagnosis described in many genetic disorders: dozens, hundreds of mutation for a given condition. Some, or many without a causal mechanism, well enough established of course.
And molecular mechanism are subject to all the problems for them to be published and known, so, double...
But, it seems I am missing a view of how these things move from monogenic diseases, to bigenic, trigenic, tetragenic, etc...
It could be illuminating. And it can start a fad: Oh, I have a quadgene disease...
Then something missing from the comments is somewhat of an understatement: the environment.
Having an environmentally independent threshold does really not work for the most common polygenic diseases: Hypertension, Diabetes, Obesity, Hypercholesterolemia, Coronary Artery Disease, Chronic Pulmonary Obstructive Disease and Stroke.
Let alone Cancer... and Occupational Diseases...
All of those are predominantly Modern Living diseases, even if Atherosclerosis seems to be there in other not that "modern" societies. But surely, limb amputations for chronic arterial insufficiency must be rare in those places: just the exercise, which is the best treatment for it...
I think such is an important point: for the most common polygenic diseases, the environment is determinant.
or am I wrong on those? :)
Then the really bad issue is Psychiatric Diseases: that´s hocus pocus. Those are not Diseases, there is not Neuropathology to them, and estimates of heritability and prevalence/incidence in the Population are just crazy: Some places have over 50% of depressed women.
Some have 30% of kids with Autism, etc.
And shifting from the steadiest of them all in the past: Schizophrenia, to Bipolar disorder, now with a 1% Prevalence is nuts. Bipolar disease used to be 1 in 1,000 to 1 in 10,000 disease, and it took decades of depression turning into a first maniac episode to be diagnosed. It was called Bipolar for a reason, and cycling took years, and most people really did not have many maniac florid episodes.
As such, either it is not the same disease, or a lot of the now called diseased are not sick at all.
Because all Diseases models needs to be History Resistant: they need to explain the past too.
Specially in Genetics!, right? :P
Oh!, I forgot:
Alcoholic Liver Diseases including Cirrhosis and Alcoholic Fatty Liver...
What convinces you that the condition is heritable in such a model? Is that assumed from the outset?
In this case yes since I'm just walking through hypothetical trait models. In general I think you have to triangulate across different estimators of heritability -- pedigree (weakest), molecular, within-family molecular (strongest) -- as well as look at the underlying mechanisms that are identified.
Yes, but at some point, you have to do that for accepted null traits for comparison.
What do you mean by accepted null traits?
What I would accept as not genetically driven, or what you would accept? Native language, use of chop sticks, where you were born, first digit of social security number. I’m sure you can think of a few.
Thanks.
I would be inclined to ask if something like fetal alcohol syndrome or TORCH diseases, does not have meaningful heritability (at least behaviour genetics, or any perhaps). If it does then somethings wrong surely.
Fetal alcohol syndrome comes from the mother drinking too much alcohol. What genetics are you referring to?
"The liability threshold model also has implications for the techno-futurist (or perhaps techno-dystopian) theories about the elimination of genetic diseases through selective breeding."
I read a lot of techno-futurists and they are far more likely to talk about things like embryo selection than selective breeding. But when it comes to embryo selection, polygenic and monogenic diseases are similar, as long as the heritabilities of the PGSs are similar.
It is puzzling to me that you would object to techno-futurists on the basis of selective breeding without mentioning that this doesn't apply to polygenic selection. And this is far from the first time you've done this thing. It gives the impression that your mention of the policy implications of the theory does not derive from genuine engagement with the discourse in the area where you try to make the policies work, but rather that your writing derives from trying to collect as many objections to this policy area as possible (perhaps indirectly by abstracting the discourse of other geneticists who are trying to collect as many objections as possible).
Eh, I'm pretty sure you're overthinking it. The specific techno-futurism I was thinking about were the frequent claims that crime can be "weeded out" with eugenics (I cited a recent example - https://x.com/SashaGusevPosts/status/1872352725188657640 - on twitter but thought it was a distraction to include here). The reason I didn't talk about embryo selection was because (1) the mechanisms are largely unrelated, as you note, and the conflation with natural selection mostly confuses people and (2) I already wrote extensively about embryo selection (https://theinfinitesimal.substack.com/p/science-fictions-are-outpacing-science) including the underlying genetic models, expected gains, and caveats. I also did not talk about polygenic gene editing, another issue on which techno-futurists have made very bold and nonsensical claims. There's only so much time.
For what its worth, I'm personally optimistic about embryo selection as a modestly accurate disease screen and (more importantly) a tool to help disease-affected parents who are otherwise reluctant to have children. The concerns I expressed in the previous article on the topic are that (1) contemporary embryo screening companies are misleading their customers about the yield and reliability of their products and (2) the techno-futurist obsession with selecting on IQ is completely unmoored from what the data actually tells us is possible. I think if you actually care about embryo selection for health and not as a fun content generator for blogposts, then you should be pretty furious with the startups in this space making misleading claims and the tech pundits pulling the conversation away from broadly supported use cases in, say, cancer and rare disease and towards polarizing and ineffective use cases like IQ.
I would be pretty interested to read your opinion about polygenic gene editing, maybe even a response to Gwern's article, which is the go-to blogpost on this topic for a lot of people I reckon.
Can you link it? I'll try to write something as long as it's not just me saying "this is impossible" over and over :)
Tailcalled from below linked it, it's https://gwern.net/embryo-selection.
There is a part where he writes about why GWAS' are overall good (you will agree with him on this part I believe) and why critics of "missing heritability" thought where wrong during the 2000's (all the people who were saying GWAS was just false-positives) The article might be a little outdated in the sense that it was last updated in early 2020, and I think that's a little too old in your circles :) but I think it's pretty influential
Ah gotcha, yes I've read this. I largely agree with his section on Gene Editing: IQ (and most traits) are highly polygenic, inducing that many edits into a cell is not possible, and rare variants of large positive effect on IQ are unlikely to exist.
With respect to the broader estimates regarding embryo selection, I think my critiques are pretty much covered in the previous post (https://theinfinitesimal.substack.com/p/science-fictions-are-outpacing-science):
1. The estimates for heritability of IQ meta-analyzed by Gwern turn out to have been inflated by sloppy use of methods in the early studies and confounding from indirect effects.
2. The current within-family estimate is much lower (in the 0.12-0.19 range depending on how you estimate it) so the selection yield will be low even with an optimal polygenic score. Estimates of confounding also show that the population-level polygenic score is largely driven by population stratification not real signal.
3. The within-family genetic correlation between IQ and other positive traits also appears to be lower so you will not get much benefit on other traits and may in fact enrich for correlated traits like autism.
4. The negative direct/indirect effect correlation means that either the estimates are confounded by ascertainment in ways we do not understand or by selecting for embryos with positive direct effects on IQ you will also select for negative genetic nurture on IQ and be wiping out some of your signal in either the first or second generation.
I do find it a bit funny that at the time Gwern's post was written the GWAS/GCTA heritability for IQ was thought be quite high and so the gene editing section contains the completely reasonable expectations that rare variants will not contribute much and are unlikely to have positive effects. Now that it turns out the GWAS/GCTA heritability is actually much lower people have simply started claiming that rare variants will fill the gap without ever explaining why these prior expectations were incorrect.
I basically reject the whole "then you should be pretty furious with the startups in this space making misleading claims" point because to some extent I believe it's people's own responsibility to filter out exaggerations from companies (especially investor's own responsibility, in fact their main job), since anything else would seem to require near-totalitarian suppression of disagreement.
I interpret your claim that polygenic selection on IQ is "ineffective" to mean that it's mean treatment effect is 0 or negative, rather than giving 1 or more IQ points on average. Your claim here is as far as I can tell false. Why do you make so many false claims?
(And if I'm doing my variance arithmetic right, I think including IQ in a collection of many variables to be selected on will allow one to gain more value than could be gained by just picking one of the variables?)
As for IQ selection being "polarizing", that seems like an implicit political threat veiled to sound like it comes from an apolitical authority. Like, it's true that angry progressives might start attacking geneticists as that gets popular, but why not think "help, progressives are insane and attacking me" instead of thinking "IQ selection is polarizing"?
For reference, the primary people I've been reading on embryo selection and similar topics are Gwern and Scott Alexander, e.g. https://gwern.net/embryo-selection . I don't have the impression that you are doing them justice.
You could just read the piece and articulate your disagreements, we're talking about three lines of simulation code here it's not some deep lore.