One of Sasha's strong suits is that despite being a World leader and authority in human genetics he shows unlimited patience explaining things to people who know absolutely nothing about human genetics, which of course is fine as long as they are willing to learn from an expert.
But if the use case is for selection of IVF embryos -- the realistic money making application -- you don't care how it compares across groups and both don't care too much about the fact that some of these genes might act via family/indirect effects (if it results in them behaving differently towards your grandchildren or nieces/nephews in a way that yields more educational attainment great) and ideally use data that let you consider the genes of the parents/family.
And sure maybe everything considered it's a relatively small effect but once you consider just how much people are willing to pay for SAT prep and the like it's plausible even very small effects are totally worth thousands of dollars.
And if you take the huge amounts of money spent on various kinds of childhood enrichment or other crap that we have quite substantial reason to believe has very limited long term impacts (study after study shows impact disappearing in long term) then it seems like it's relatively justified. Parents want to spend money giving their child a leg up.
--
For instance, I know Steve Hsu has commented that there is every reason to suggest one could create an educational attainment prediction in same ballpark of efficacy as this published model for height https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6216598/
He hasn't offered it and personally I hope we do it with happiness not educational attainment
I disagree that IVF is the primary moneymaker, I would guess that incorporating polygenic scores as part of standard screening in large-scale healthcare systems is much more likely (essentially already happening in the UK). For IVF the confounders are also somewhat different (it looks like I'll need to write about this more in the future) but the basic point is that only within-family accuracy matters, so for Edu that's a max of 4% and for IQ a max of 14%. That translates into a gain of 2-5 IQ points at maximum predictive power. I also think you should care how it performs across groups if you plan to sell the test to anyone other than the descendants of north-east Europeans.
I meant for estimation of IQ/edu attainment there are plenty of other uses for other kinds of estimators. Though even if your limit is correct being able to select IVF with that kind of benefit seems pretty damn
appealing.
Regarding that supposed max, the problem there is that it bakes in an assumption that the the prediction is linear and (and no cross terms) which is almost certainly false and will underreport. Also if the question is ability to identify the fertilized egg with the best expectation your increase in expectation is a different computation that depends on more than just the average amount of variation you explain.
I mean if you take a look at the link I gave that's already a model from quite some time ago getting 40% of variation in height.
And regarding cross group predictivty you can simply build a different model for the different populations. And that's part of why you're estimates for the maximum possible explanation are too low. If you want the model to have universal validity you are giving up accuracy (ultimately these are really features of non-linear interactions but it may be easier to take a simple model and restrict to a narrow population).
And the same point about within family prediction. If you have within family genetic data your overall estimate can be in principle better than just looking at what an overall model is going to do in that situation and limiting it to the within family variation since you can capture cross terms in the data.
--
Ultimately, in principle every bit of the variation that twin studies attribute to genetics **has** to be possible to recover from sufficient information about the genome *conditional* on the environment. In other words if you know that genetic variation is 50% and you know the environment you can predict that 50% in principle.
True you can't ever completely determine the environment but that's my point about having genetic data of parents. That partly determines the environment so you get more accuracy than if it had to be agnostic about that information.
And apparently now the height predictors can explain up to 50% of variation which it seems like is a direct contradiction to your claimed limits which seemed to max out at around 25% unless I misunderstood.
"So I agree with Gusev that the current status of cognitive ability prediction is very weak"
Accuracy needs to be evaluated for each trait. Just because prediction works well for height (within-family heritability of 0.4-0.5) does not mean it will work well for a trait like IQ (within-family heritability of 0.14) or education (within-family heritability of 0.04).
Yah, seems about right. I’m not claiming it's currently very good nor have any view about the current companies only that I don't think you’ve shown an in principle limitation since there are assumptions made in that paper that need not be true in all models — the citations were merely for the point of showing that limit is unduly pessimistic.
To be clear, I personally hope that people focus on predictors for other traits — in particular happiness/depression and wouldn't be at all disappointed if edu attainment predictors turned out to be unduly difficult. We know that people differ substantially in their inclination for joy vs depression and even for the smallest gain that seems worthwhile. After all, what's the point of money other than to enable more joy in life.
I just don't think the methodology you cited can possibly hope to give an upper bound. The theoretical upper bound with unlimited data and unlimited computer power has to be about that given by twin studies so any argument there is a lower upper bound has to have the form: it's computationally impossible to do better or the data requirements are too large. And the cited paper doesn't have that kind of generality.
The numbers I outlined provide the upper limits on what could be achieved with infinite sample sizes from the prediction models that everyone currently uses. But I don't know how to address the claim that there are potentially other models out there we've never seen that will work better.
I still don't think that estimate you give is actually strictly an upper bound for the kind of models in question (see below) but it's probably in the right neighborhood for this kind of model.
But ok, so what if the current models are limited to this degree? It doesn't then limit what can be developed in principle anymore and I still think for things like IVF the imputed value for even small changes in edu attainment is crazy large based on what people spend on enrichment or sat training etc..
As I said, I kinda hope we can't do any better for IVF on edu attainment bc I want people to maximize happiness (and even small amounts of total variance in this context might be something people would pay a great deal for) but it doesn't seem like we can rule out companies developing the ability to do much better than the current models.
--
In particular, the estimate of SNP heritability is only valid if the population used in the estimate is the same as the population used in the model. So if you train a model on say, only icelandic individuals you can do better than an estimate made across all Europeans. That's what I was getting at about not necessarily wanting a single giant model. Effectively by training on a data set that is more like your application you are overcoming a bit of the limitations of the purely linear model to fit the full function.
And you could in theory do even better for in family prediction even with the same type of model (basically simple linear regression) for things like IVF if you get to use parent genetics as well.
But we probably aren't talking about huge differences here. That kind of thing might get you a bit over the limit but it's not going to be by much.
If you include the earliest available school grades, the rest drops? Well, no shit, Sherlock. You could also drop in an actual IQ test and see how predictive power similarly drops. This is overdiscounting by including the answer.
But that's not the point. The genetic prediction of IQ in adults is not to learn one's IQ (I hope, at least: for that, there are indeed better measurements), it's to learn to what extent the IQ is, well, genetic, as opposed to schooling or other nurture. And this doesn't show that it is bad at that.
Do you know the concept of "test sample" in machine learning? You _first_ calculate what the genetics predicts without the scores and _then_ compare to the scores. If you include scores within the calculation and then subtract them, you'll get zero almost by definition.
“The total GWAS SNP-heritability of IQ is ~19% [Savage et al. 2018], so even assuming a perfect polygenic score could eventually be constructed, that score would explain about 6.5 IQ points.”
How is GWAS SNP-heritability similar to or different from traditional measures of heritability found through twin and adoption studies? Shouldn’t those numbers be the theoretical limit of what can be found through polygenic scores?
GWAS SNP-heritability is specifically defined as the variance that can be explained by the GWAS SNPs (and anything they are correlated with). This is why it's a useful parameter: give it SNPs and it tells you the maximum r2 you could achieve with the best unbiased linear predictor built from those SNPs in the same population.
Twin heritabilities are an approximation of the total additive genetic contribution after setting a bunch of components to zero, some of which we are confident are not zero (e.g. twin-specific environment, assortative mating) and some of which we don't know (gene-environment interactions). You can see this in the fact that extended twin/family designs can produce very different estimates (see: https://wyclif.substack.com/p/five-toy-worlds-to-think-about-heritability).
But even if you take the classic twin estimate as the ground truth, it is not clear how that translates into building a polygenic predictor. For example, if a large fraction of the twin heritability is explained by ultra-rare variants that are nearly private in families, it will not be possible to give them weights in a polygenic score.
Wait, why a linear predictor? Why would we expect linear relationships here and why would we expect the performance ceiling to come from linear models?
Which kind of runs into the other thing I get running through the Lennon et al 2024 paper (https://www.nature.com/articles/s41591-024-02796-z) which is...I mean, it's Nature, it's solid...but why are they doing it this way?
This smells weird. I'm reading this paper, there's a lot of stuff I don't understand, there's a lot of discussion and validation of a data pipeline, why, this is a triviality? I try to get to the meat, which looks like Figure 2, but...why are we reporting means-odds ratios with 95% confidence intervals instead of a simple confusion matrix? I try to find something that looks recognizable and there's AUC values in Supplementary Table 2 (https://static-content.springer.com/esm/art%3A10.1038%2Fs41591-024-02796-z/MediaObjects/41591_2024_2796_MOESM3_ESM.xlsx) for each model and they compare the AUC for genomic prediction alone, non-genetic predictor alone, and combined, which is awesome...but why is this an AUC score instead of AUROC score, like everybody uses? Did they mean AUROC? That seems...odd, because the odds ration that's included in the same table is defined as "Odds ratios are reported as the mean odds ratios (square dot) associated with having a score above the specified threshold", which sounds like it's defined by a specific threshold.
And I'm sorry, I feel like I'm getting into the weeds here, so let me make my issue plain. This study should be easy. They've got 2,500 rows, we're not training a model so there's no need for training/test split, so you just run the model to see how well it predicts diseases in a population it hasn't trained on, exactly analogous to embryo selection, and measure how well it performs with a confusion matrix and an AUROC curve...like every junior data science project for the past five years.
I'm really interested in all the confounder issues you listed and Nature is a great journal...but this looks like a lot of effort to use statistical instead of machine learning techniques that only makes the research less clear, not more. Now I haven't worked on any GWAS projects, I'm sure there's a lot I don't know, maybe this is standard in that field...but this doesn't smell right? The entire point is to use advanced machine learning algorithms to predict with embryo's would develop these diseases, right? Validation of these models doesn't seem to require any advanced techniques. If this is the gold standard for rigor...I've got to be missing something. What's, what's going on?
Thanks for the comment, the short answer is that, yes, the Lennon et al. paper REALLY is the state of the art and the conventional way of doing things in this field and there's nothing fishy about it. I know it's not a very satisfying answer so let me expand briefly (but to some extent you'll also have to take my word for it):
Simple linear predictors are used for two reasons: (1) Logistically a typical GWAS is actually many cohorts/institutions all running analyses internally, sharing the summary statistics (z-scores/p-values), and then meta-analyzing them all (using simple inverse-variance weighted marginal effects). That puts a big limitation on anything non-linear you can do within cohorts. (2) Practically it turns out that most common traits are driven by a large number of small effect variants that are very well approximated by a linear model (followed by some shrinkage models for building scores, as I mentioned). There's simply very little evidence of interaction or non-additivity for most traits and so the juice is not worth the squeeze of implementing fancy ML. I assure you that everyone is very eager to hop on the bandwagon and apply deep learning but it simply doesn't add very much over RidgeReg/LASSO in this application. This is not to say that advanced computation isn't being applied but it is mostly focused on getting a genome-wide ridge regression running efficiently over 500k individuals and 10 million variants on a standard CPU setup (for example check out this Supp Note here : https://www.nature.com/articles/ng.3190).
Regarding Lennon et al. I think these are mostly culturally differences. "AUC" is indeed "AUROC". The reason it's in the Supplement is because these scores are intended for clinicians and clinicians think in terms of Odds Ratios or relative risk. There are established thresholds for risk when an intervention is recommended (think hysterectomy for BRCA carriers at risk for ovarian cancer) and these are generally in terms of Odds Ratios; Lennon et al. wants people in the clinic using their scores in the same way, so they are reporting the metrics in a way that is concordant with how their target audience thinks about it. Moreover, we know these traits are not fully heritable so the AUC will never reach 1, therefore the goal is to identify the largest possible "high risk" (e.g. defined by an Odds Ratio) group rather than to maximize classification accuracy (though these are obviously related). If you're interested, I linked to papers on breast and prostate cancer that do report AUC/C-statistic type measures, but those were for a non-clinical audience.
Ah...wait, wait, let me make sure I understand this.
So there's a fundamental issue where institutions don't/can't share genetic information and are just reporting out results. That's depressing but makes sense and...yeah, I'd be careful with genomes too.
But the thing that's jumping out is the SNP additive thing, which seems deeply counterintuitive. Pretend we have 4 gene sequences that...determine height because I want a continuous rather than binary outcome variable. Pretend there are 3 SNP vectors are significant in terms of determining height. So what we see is the 1st SNP vector contributes 0.25 inches, the 2nd SNP vector contributes 0.3 inches, the 3rd contributes 0.1 inches, but there's no/few situations where they interact in important ways, eg if SNP vectors 1 and 3 are both active/on then the patient's height is 0.35 higher, not 0.6 or 0.1, it's just straight additive?
That's....really counterintuitive. Do I understand that right?
- GWAS SNP PRS are constructed with a simplistic additive effects model; they cannot capture complex epistasis
- GWAS SNP PRS are constructed almost exclusively from common variants; even the rare variants study by Chen et al only looked at protein truncating variants, a small subset of rare variants.
- GWAS has extremely strict significance thresholds (bonferroni correction of the number of variants included). Thus even with huge sample sizes, they're still usually underpowered to uncover all contributing variants and should be seen as bare minimums.
- SNP based GWAS cannot account for non-SNP genomic variation (structural variations like tandem repeats, copy number variants, translocations, etc)
- GWAS SNP PRS typically don't factor in age or sex
Let me reiterate that the SNP heritability estimate does not rely on significance threshold and is not the bare minimum; it is the estimate of the highest r2 accuracy you could achieve from *all* SNPs in the GWAS. If we're talking about a common polygenic score, this is the only parameter that matters (and arguably the within-family estimate if you care about confounding).
Regarding other novel sources of prediction to be discovered in the future: sure, anything is possible! One could also argue that we will identify many highly precise environmental measures from which to predict disease too, since we're just speculating why stop at genetics?
Does the data actually support a massive reservoir of predictive genetic variation out there? I would say no:
-- large-scale, replicating genetic epistasis has essentially never been observed to date (and, if it exists, would be extremely difficult to build predictors for)
-- the Chen et al. findings of few IQ genes and most previously known for developmental delay should at a minimum increase your skepticism of a large rare variant contribution.
Another statement that seems to be contradicted by behavioral genetics literature: “education has a causal effect on IQ”. In the short run, this statement is true but may be artifactual (or so I’ve read). In the long run, twin studies, studies of early pre-school programs, and adoption studies (such as a large Korean adoption study) indicate that this is false. Once you account for variation in genes, it seems that being made to go through more schooling does not make one’s IQ higher. Again, please link to some literature to re-educate me on this topic if you know it better than I do and I am misunderstanding something.
Don’t twin and adoption studies of the behavioral genetics literature flatly contradict the statement below?
“But, for multifactorial traits, genetics is just one small risk factor of very many, with the vast majority of variation in risk simply unexplained (aka bad luck).”
Not knowing which genes are doing the work does not make genetics “just one small risk factor of many”. The punchline of research in behavioral genetics is that genes are every bit as strong as non-shared environment “(aka bad luck)” when accounting for variations in outcomes for most traits, with shared environment (aka all the systematic ways the environment for children differs more between families than within the same family) accounting for almost no variation in adult outcomes. To illustrate, identical twins (same genes) don’t end up more similar when they are raised in the same family versus when they are separated at birth and raised in different families (illustrating that shared environmental influence is very weak). But they are still not literally identical in adult outcomes (indicating that non-shared environment “aka bad luck” is also a potent force).
Please provide literature links to a better summary of the Behaviorial Gen. literature if you are more expert than me and you think my summary seems off.
Twin and adoption estimates are irrelevant to genomic prediction because they do not actually identify any genetic variation. Twin and adoption studies are also widely appreciated to be environmentally biased, see: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6130754/. Even the most die-hard behavioral geneticist will tell you not to take twin estimates literally because they pack in a ton of assumptions.
On Genetic Variety and the Human Body” by Amand Marie Leroi. You should create a book level tour of these databases.
I expected very low ability to predict IQ, since in my experience it is a negative predictive measure, not positive. And indeed the predictive correlation being less than normal variation in test results is my expectation.
Your final paragraph was key:
One final thought: there does exist an important use-case for genome sequencing and cognitive function, and that is the diagnosis of rare developmental disorders.
Having a Y chromosome sets the platform in place to be create sperm and father children. Having 1 child or 100, numerical “achievement” of people with the gene will not correlate to anything particular genetically However defects in SRY will manifest in fewer or no children.
A genetic platform for absorbing information, culling irrelevant connectivity, and maintaining plasticity is a ground state IMHO, a ground state for everyone’s intelligence. It says nothing about education.
Genetic variation causing inability to perform any of the three will result in lower IQ. For example, a simple neurotransmitter SNP will reduce what the platform could achieve, but can’t predict what the platform would be exposed to. Paternal age at conception correlates to precisely these challenges, neurotransmitter issues which create autism, schizophrenia or psychosis, and diminished IQ.
One of Sasha's strong suits is that despite being a World leader and authority in human genetics he shows unlimited patience explaining things to people who know absolutely nothing about human genetics, which of course is fine as long as they are willing to learn from an expert.
Please consider enabling TTS. I prefer passively listening as I do other things in general, it's more enjoyable. Thank you
https://support.substack.com/hc/en-us/articles/7265753724692-How-do-I-listen-to-a-Substack-post-
(Request form)
https://airtable.com/shr11c70LRWq9saOb
Thank you, I didn't know about this feature and I've submitted a request.
This is excellent.
Thank you!
But if the use case is for selection of IVF embryos -- the realistic money making application -- you don't care how it compares across groups and both don't care too much about the fact that some of these genes might act via family/indirect effects (if it results in them behaving differently towards your grandchildren or nieces/nephews in a way that yields more educational attainment great) and ideally use data that let you consider the genes of the parents/family.
And sure maybe everything considered it's a relatively small effect but once you consider just how much people are willing to pay for SAT prep and the like it's plausible even very small effects are totally worth thousands of dollars.
And if you take the huge amounts of money spent on various kinds of childhood enrichment or other crap that we have quite substantial reason to believe has very limited long term impacts (study after study shows impact disappearing in long term) then it seems like it's relatively justified. Parents want to spend money giving their child a leg up.
--
For instance, I know Steve Hsu has commented that there is every reason to suggest one could create an educational attainment prediction in same ballpark of efficacy as this published model for height https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6216598/
He hasn't offered it and personally I hope we do it with happiness not educational attainment
I disagree that IVF is the primary moneymaker, I would guess that incorporating polygenic scores as part of standard screening in large-scale healthcare systems is much more likely (essentially already happening in the UK). For IVF the confounders are also somewhat different (it looks like I'll need to write about this more in the future) but the basic point is that only within-family accuracy matters, so for Edu that's a max of 4% and for IQ a max of 14%. That translates into a gain of 2-5 IQ points at maximum predictive power. I also think you should care how it performs across groups if you plan to sell the test to anyone other than the descendants of north-east Europeans.
I meant for estimation of IQ/edu attainment there are plenty of other uses for other kinds of estimators. Though even if your limit is correct being able to select IVF with that kind of benefit seems pretty damn
appealing.
Regarding that supposed max, the problem there is that it bakes in an assumption that the the prediction is linear and (and no cross terms) which is almost certainly false and will underreport. Also if the question is ability to identify the fertilized egg with the best expectation your increase in expectation is a different computation that depends on more than just the average amount of variation you explain.
I mean if you take a look at the link I gave that's already a model from quite some time ago getting 40% of variation in height.
And regarding cross group predictivty you can simply build a different model for the different populations. And that's part of why you're estimates for the maximum possible explanation are too low. If you want the model to have universal validity you are giving up accuracy (ultimately these are really features of non-linear interactions but it may be easier to take a simple model and restrict to a narrow population).
And the same point about within family prediction. If you have within family genetic data your overall estimate can be in principle better than just looking at what an overall model is going to do in that situation and limiting it to the within family variation since you can capture cross terms in the data.
--
Ultimately, in principle every bit of the variation that twin studies attribute to genetics **has** to be possible to recover from sufficient information about the genome *conditional* on the environment. In other words if you know that genetic variation is 50% and you know the environment you can predict that 50% in principle.
True you can't ever completely determine the environment but that's my point about having genetic data of parents. That partly determines the environment so you get more accuracy than if it had to be agnostic about that information.
Ohh and here is a nature publication showing that even within sibling polygenic prediction can do damn well.
https://www.nature.com/articles/s41598-020-69927-7
And here is the author answering many of the kind of concerns you raise and explaining why you can capture quite a bit of variation.
https://youtu.be/43DDPzM0pHc?si=-HzuAVIyJW0lDDoH
Or a survey paper. https://arxiv.org/abs/2101.05870
And apparently now the height predictors can explain up to 50% of variation which it seems like is a direct contradiction to your claimed limits which seemed to max out at around 25% unless I misunderstood.
Here's the author of the paper you cited:
https://x.com/hsu_steve/status/1809585980220829835
"So I agree with Gusev that the current status of cognitive ability prediction is very weak"
Accuracy needs to be evaluated for each trait. Just because prediction works well for height (within-family heritability of 0.4-0.5) does not mean it will work well for a trait like IQ (within-family heritability of 0.14) or education (within-family heritability of 0.04).
Yah, seems about right. I’m not claiming it's currently very good nor have any view about the current companies only that I don't think you’ve shown an in principle limitation since there are assumptions made in that paper that need not be true in all models — the citations were merely for the point of showing that limit is unduly pessimistic.
To be clear, I personally hope that people focus on predictors for other traits — in particular happiness/depression and wouldn't be at all disappointed if edu attainment predictors turned out to be unduly difficult. We know that people differ substantially in their inclination for joy vs depression and even for the smallest gain that seems worthwhile. After all, what's the point of money other than to enable more joy in life.
I just don't think the methodology you cited can possibly hope to give an upper bound. The theoretical upper bound with unlimited data and unlimited computer power has to be about that given by twin studies so any argument there is a lower upper bound has to have the form: it's computationally impossible to do better or the data requirements are too large. And the cited paper doesn't have that kind of generality.
The numbers I outlined provide the upper limits on what could be achieved with infinite sample sizes from the prediction models that everyone currently uses. But I don't know how to address the claim that there are potentially other models out there we've never seen that will work better.
Ohh, I misunderstood the claim.
I still don't think that estimate you give is actually strictly an upper bound for the kind of models in question (see below) but it's probably in the right neighborhood for this kind of model.
But ok, so what if the current models are limited to this degree? It doesn't then limit what can be developed in principle anymore and I still think for things like IVF the imputed value for even small changes in edu attainment is crazy large based on what people spend on enrichment or sat training etc..
As I said, I kinda hope we can't do any better for IVF on edu attainment bc I want people to maximize happiness (and even small amounts of total variance in this context might be something people would pay a great deal for) but it doesn't seem like we can rule out companies developing the ability to do much better than the current models.
--
In particular, the estimate of SNP heritability is only valid if the population used in the estimate is the same as the population used in the model. So if you train a model on say, only icelandic individuals you can do better than an estimate made across all Europeans. That's what I was getting at about not necessarily wanting a single giant model. Effectively by training on a data set that is more like your application you are overcoming a bit of the limitations of the purely linear model to fit the full function.
And you could in theory do even better for in family prediction even with the same type of model (basically simple linear regression) for things like IVF if you get to use parent genetics as well.
But we probably aren't talking about huge differences here. That kind of thing might get you a bit over the limit but it's not going to be by much.
If you include the earliest available school grades, the rest drops? Well, no shit, Sherlock. You could also drop in an actual IQ test and see how predictive power similarly drops. This is overdiscounting by including the answer.
I agree, the genetic score is not adding any value over existing easily available measurements, especially in adults.
But that's not the point. The genetic prediction of IQ in adults is not to learn one's IQ (I hope, at least: for that, there are indeed better measurements), it's to learn to what extent the IQ is, well, genetic, as opposed to schooling or other nurture. And this doesn't show that it is bad at that.
How would you learn that without taking into account the actual IQ test scores?
Do you know the concept of "test sample" in machine learning? You _first_ calculate what the genetics predicts without the scores and _then_ compare to the scores. If you include scores within the calculation and then subtract them, you'll get zero almost by definition.
“The total GWAS SNP-heritability of IQ is ~19% [Savage et al. 2018], so even assuming a perfect polygenic score could eventually be constructed, that score would explain about 6.5 IQ points.”
How is GWAS SNP-heritability similar to or different from traditional measures of heritability found through twin and adoption studies? Shouldn’t those numbers be the theoretical limit of what can be found through polygenic scores?
GWAS SNP-heritability is specifically defined as the variance that can be explained by the GWAS SNPs (and anything they are correlated with). This is why it's a useful parameter: give it SNPs and it tells you the maximum r2 you could achieve with the best unbiased linear predictor built from those SNPs in the same population.
Twin heritabilities are an approximation of the total additive genetic contribution after setting a bunch of components to zero, some of which we are confident are not zero (e.g. twin-specific environment, assortative mating) and some of which we don't know (gene-environment interactions). You can see this in the fact that extended twin/family designs can produce very different estimates (see: https://wyclif.substack.com/p/five-toy-worlds-to-think-about-heritability).
But even if you take the classic twin estimate as the ground truth, it is not clear how that translates into building a polygenic predictor. For example, if a large fraction of the twin heritability is explained by ultra-rare variants that are nearly private in families, it will not be possible to give them weights in a polygenic score.
Wait, why a linear predictor? Why would we expect linear relationships here and why would we expect the performance ceiling to come from linear models?
Which kind of runs into the other thing I get running through the Lennon et al 2024 paper (https://www.nature.com/articles/s41591-024-02796-z) which is...I mean, it's Nature, it's solid...but why are they doing it this way?
This smells weird. I'm reading this paper, there's a lot of stuff I don't understand, there's a lot of discussion and validation of a data pipeline, why, this is a triviality? I try to get to the meat, which looks like Figure 2, but...why are we reporting means-odds ratios with 95% confidence intervals instead of a simple confusion matrix? I try to find something that looks recognizable and there's AUC values in Supplementary Table 2 (https://static-content.springer.com/esm/art%3A10.1038%2Fs41591-024-02796-z/MediaObjects/41591_2024_2796_MOESM3_ESM.xlsx) for each model and they compare the AUC for genomic prediction alone, non-genetic predictor alone, and combined, which is awesome...but why is this an AUC score instead of AUROC score, like everybody uses? Did they mean AUROC? That seems...odd, because the odds ration that's included in the same table is defined as "Odds ratios are reported as the mean odds ratios (square dot) associated with having a score above the specified threshold", which sounds like it's defined by a specific threshold.
And I'm sorry, I feel like I'm getting into the weeds here, so let me make my issue plain. This study should be easy. They've got 2,500 rows, we're not training a model so there's no need for training/test split, so you just run the model to see how well it predicts diseases in a population it hasn't trained on, exactly analogous to embryo selection, and measure how well it performs with a confusion matrix and an AUROC curve...like every junior data science project for the past five years.
I'm really interested in all the confounder issues you listed and Nature is a great journal...but this looks like a lot of effort to use statistical instead of machine learning techniques that only makes the research less clear, not more. Now I haven't worked on any GWAS projects, I'm sure there's a lot I don't know, maybe this is standard in that field...but this doesn't smell right? The entire point is to use advanced machine learning algorithms to predict with embryo's would develop these diseases, right? Validation of these models doesn't seem to require any advanced techniques. If this is the gold standard for rigor...I've got to be missing something. What's, what's going on?
Thanks for the comment, the short answer is that, yes, the Lennon et al. paper REALLY is the state of the art and the conventional way of doing things in this field and there's nothing fishy about it. I know it's not a very satisfying answer so let me expand briefly (but to some extent you'll also have to take my word for it):
Simple linear predictors are used for two reasons: (1) Logistically a typical GWAS is actually many cohorts/institutions all running analyses internally, sharing the summary statistics (z-scores/p-values), and then meta-analyzing them all (using simple inverse-variance weighted marginal effects). That puts a big limitation on anything non-linear you can do within cohorts. (2) Practically it turns out that most common traits are driven by a large number of small effect variants that are very well approximated by a linear model (followed by some shrinkage models for building scores, as I mentioned). There's simply very little evidence of interaction or non-additivity for most traits and so the juice is not worth the squeeze of implementing fancy ML. I assure you that everyone is very eager to hop on the bandwagon and apply deep learning but it simply doesn't add very much over RidgeReg/LASSO in this application. This is not to say that advanced computation isn't being applied but it is mostly focused on getting a genome-wide ridge regression running efficiently over 500k individuals and 10 million variants on a standard CPU setup (for example check out this Supp Note here : https://www.nature.com/articles/ng.3190).
Regarding Lennon et al. I think these are mostly culturally differences. "AUC" is indeed "AUROC". The reason it's in the Supplement is because these scores are intended for clinicians and clinicians think in terms of Odds Ratios or relative risk. There are established thresholds for risk when an intervention is recommended (think hysterectomy for BRCA carriers at risk for ovarian cancer) and these are generally in terms of Odds Ratios; Lennon et al. wants people in the clinic using their scores in the same way, so they are reporting the metrics in a way that is concordant with how their target audience thinks about it. Moreover, we know these traits are not fully heritable so the AUC will never reach 1, therefore the goal is to identify the largest possible "high risk" (e.g. defined by an Odds Ratio) group rather than to maximize classification accuracy (though these are obviously related). If you're interested, I linked to papers on breast and prostate cancer that do report AUC/C-statistic type measures, but those were for a non-clinical audience.
Ah...wait, wait, let me make sure I understand this.
So there's a fundamental issue where institutions don't/can't share genetic information and are just reporting out results. That's depressing but makes sense and...yeah, I'd be careful with genomes too.
But the thing that's jumping out is the SNP additive thing, which seems deeply counterintuitive. Pretend we have 4 gene sequences that...determine height because I want a continuous rather than binary outcome variable. Pretend there are 3 SNP vectors are significant in terms of determining height. So what we see is the 1st SNP vector contributes 0.25 inches, the 2nd SNP vector contributes 0.3 inches, the 3rd contributes 0.1 inches, but there's no/few situations where they interact in important ways, eg if SNP vectors 1 and 3 are both active/on then the patient's height is 0.35 higher, not 0.6 or 0.1, it's just straight additive?
That's....really counterintuitive. Do I understand that right?
That's right, additivity is the norm
One has to keep in mind that:
- GWAS SNP PRS are constructed with a simplistic additive effects model; they cannot capture complex epistasis
- GWAS SNP PRS are constructed almost exclusively from common variants; even the rare variants study by Chen et al only looked at protein truncating variants, a small subset of rare variants.
- GWAS has extremely strict significance thresholds (bonferroni correction of the number of variants included). Thus even with huge sample sizes, they're still usually underpowered to uncover all contributing variants and should be seen as bare minimums.
- SNP based GWAS cannot account for non-SNP genomic variation (structural variations like tandem repeats, copy number variants, translocations, etc)
- GWAS SNP PRS typically don't factor in age or sex
Let me reiterate that the SNP heritability estimate does not rely on significance threshold and is not the bare minimum; it is the estimate of the highest r2 accuracy you could achieve from *all* SNPs in the GWAS. If we're talking about a common polygenic score, this is the only parameter that matters (and arguably the within-family estimate if you care about confounding).
Regarding other novel sources of prediction to be discovered in the future: sure, anything is possible! One could also argue that we will identify many highly precise environmental measures from which to predict disease too, since we're just speculating why stop at genetics?
Does the data actually support a massive reservoir of predictive genetic variation out there? I would say no:
-- large-scale, replicating genetic epistasis has essentially never been observed to date (and, if it exists, would be extremely difficult to build predictors for)
-- common CNVs are well-tagged by common SNPs
-- rare CNVs do not show evidence of a substantial contribution to heritability (https://www.nature.com/articles/s41588-024-01684-z, for example, identified ~3 per trait tested)
-- the Chen et al. findings of few IQ genes and most previously known for developmental delay should at a minimum increase your skepticism of a large rare variant contribution.
There’s no evidence that Shkreli’s project actually involves Baron Trump. You need an “allegedly” in there.
Good catch, last thing I want to do is get sued by the Trump family :)
Another statement that seems to be contradicted by behavioral genetics literature: “education has a causal effect on IQ”. In the short run, this statement is true but may be artifactual (or so I’ve read). In the long run, twin studies, studies of early pre-school programs, and adoption studies (such as a large Korean adoption study) indicate that this is false. Once you account for variation in genes, it seems that being made to go through more schooling does not make one’s IQ higher. Again, please link to some literature to re-educate me on this topic if you know it better than I do and I am misunderstanding something.
Sure, here you go:
https://theinfinitesimal.substack.com/p/does-education-increase-intelligence
Thanks! I’ll check that out.
Don’t twin and adoption studies of the behavioral genetics literature flatly contradict the statement below?
“But, for multifactorial traits, genetics is just one small risk factor of very many, with the vast majority of variation in risk simply unexplained (aka bad luck).”
Not knowing which genes are doing the work does not make genetics “just one small risk factor of many”. The punchline of research in behavioral genetics is that genes are every bit as strong as non-shared environment “(aka bad luck)” when accounting for variations in outcomes for most traits, with shared environment (aka all the systematic ways the environment for children differs more between families than within the same family) accounting for almost no variation in adult outcomes. To illustrate, identical twins (same genes) don’t end up more similar when they are raised in the same family versus when they are separated at birth and raised in different families (illustrating that shared environmental influence is very weak). But they are still not literally identical in adult outcomes (indicating that non-shared environment “aka bad luck” is also a potent force).
Please provide literature links to a better summary of the Behaviorial Gen. literature if you are more expert than me and you think my summary seems off.
Twin and adoption estimates are irrelevant to genomic prediction because they do not actually identify any genetic variation. Twin and adoption studies are also widely appreciated to be environmentally biased, see: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6130754/. Even the most die-hard behavioral geneticist will tell you not to take twin estimates literally because they pack in a ton of assumptions.
Thank you for the link and comment!
1.- rising mean through selective mating is plausible under genetic principles
2.- the low accuracy of predicting individual outcomes means we need to improve our tools to amplify the selective mating effect.
First, great piece. Reminds me of “Mutants:
On Genetic Variety and the Human Body” by Amand Marie Leroi. You should create a book level tour of these databases.
I expected very low ability to predict IQ, since in my experience it is a negative predictive measure, not positive. And indeed the predictive correlation being less than normal variation in test results is my expectation.
Your final paragraph was key:
One final thought: there does exist an important use-case for genome sequencing and cognitive function, and that is the diagnosis of rare developmental disorders.
Having a Y chromosome sets the platform in place to be create sperm and father children. Having 1 child or 100, numerical “achievement” of people with the gene will not correlate to anything particular genetically However defects in SRY will manifest in fewer or no children.
A genetic platform for absorbing information, culling irrelevant connectivity, and maintaining plasticity is a ground state IMHO, a ground state for everyone’s intelligence. It says nothing about education.
Genetic variation causing inability to perform any of the three will result in lower IQ. For example, a simple neurotransmitter SNP will reduce what the platform could achieve, but can’t predict what the platform would be exposed to. Paternal age at conception correlates to precisely these challenges, neurotransmitter issues which create autism, schizophrenia or psychosis, and diminished IQ.
Every trait is inherited, so this just reaffirms a trivial fact. Were you intending to make a different point?
Sorry, going to apply a heavy touch here with comments, missing the point twice in a row and then reverting to racism gets you a ban.