I said in my recent article that as GWAS methodology improved and more and more gene variants were included, heritability estimates across the board ought to go up. This was the guess of someone who has no formal education in genetics. But what you seem to be saying is that, at least for mental traits like IQ, the higher-quality the GWAS, the lower the heritability found. So do you disagree with me? Do you think the ultimate conclusion of GWAS will be that IQ has a heritability of less than 20%? My article: https://open.substack.com/pub/eclecticinquiries/p/twin-studies-exaggerate-iq-heritability?r=4952v2&utm_campaign=post&utm_medium=web
I think there are two forces at play: as we add more genetic variation into GWAS analyses (specifically, rare variants which are the major class of variation not currently captured by GWAS) we should see the heritability estimate go up; as we understand the biases more and account for them, we should see the heritability estimate go down. In the case of height, as GWAS platforms have gotten better, the heritability estimate went from ~40% to ~45% with rare variants adding another 10-20%; with the Tan et al. paper showing that confounding on height is relatively minor. For IQ, the opposite has been true, as we've gotten better at isolating biases the estimates went from 50% (in one of the earliest studies, which had major stratification issues) to ~20% to ~12% direct with my analysis of the Tan et al. data; with the Tan et al. paper showing that confounding is a major issue. In contrast, the influence of rare variants on IQ (in the healthy population) has been negligible from the few studies that have analyzed it. So for IQ I think we are still on the "figuring out biases that inflate the estimate" end rather than the "finding new variants that deflated the estimate" end. As to the total heritability, I think the best estimate we have now is the ~12% estimate from the Tan et al. data with an additional ~1.5x boost from rare variants which gets you to (very crudely) ~18% or so.
If two subjects of unknown relatedness have very different GWAS scores for a very complex trait involving thousands of SNPs, that would correlate with their being only distantly related. But with siblings the degree of relatedness is fixed, so for the same degree of difference in the GWAS score, their would be less chance that other genetic factors (SNPs not considered in the GWAs, SNPS effectiveness amplified by other SNPS) would also be different. So, these results do not mean that complex traits are not mostly heritable. It only means that we have not figured it all out .... yet.
It annoys how often in scientific discussion, the issues of problems with measurement are avoided. The focus is all around, but almost never about the problems with measurement techniques and the sometimes serious limitations they embody. Heritability by way of SNP-GWAS is one of those. The geneticists at work tend to laugh at all those thousands of dollars/euros that are wasted on such extremely large samplesizes, but in the end, when you make the sample larger but don't do anything about the sensitivity of false positives in the methods, you only enlarge potential biases. Power of statistics is not everything that should be looked at.
What are your thoughts on the SNP chips often used for these GWAS. From my understanding of genetics, SNP tend to be the most common mutations but also the least impactfull. And also when using SNP's as proxies for genes you walk into a whole other ballpark of potential biases, which include known vs unknown genetics in the whole genome. SNP are by their definition in those chips only known SNP's which even bias the estimates even further. I generally do not trust SNP based GWAS, just because they are cheap to use does not mean they are good to use.
I'll need to write up a longer post on it but contemporary SNP array data followed by imputation tends to pick up effectively all common variants and is very difficult to outperform. Common SNP variants also tag common structural variants very well, so it is unlikely that there is untapped common SV signal out there (note: some of these associations could still be *driven* by SVs we haven't typed, but tagged very well by SNPs we did type). In terms of variation that is not captured by GWAS, I think it's either rare variants or interactions.
Oké, that is good news. But is a SNP array not a biased and incomplete perspective of the exome? I. E. It only contains SNP's that have garnered interest in the past? Of course when you gather enough of them the bias would eventually decrease. But any type of copy number variation will not reliably be measured, and neither does the the rest of the genome that is not exome. Which from what I remember contains a lot of structural information which indirectly influences gene expressions as well.
But like I said, I am not all that up to date when it comes to genome measuring tech. I just hope that GWAS will move on from SNP chips to whole genome. Where larger variations will also be captured. The ones that have been done are all of smaller samples which in itself is of course a problem. I'd like you perspective. And if you ever get to writing an article about it I would love to read it.
I don't think so. Modern SNP arrays are designed to be a comprehensive sampling of common variation *genome-wide*, often based on whole-genome data from many representative populations. Sometimes special exonic variants (or other ROIs) are added, but not at the cost of broad sampling. I don't know where the idea that they only/largely capture the exome comes from, it's just not the case and most GWAS heritability is non-coding.
It seems I got a bit outdated information and or misremembered a few things about SNP chips,ill check my sources on that. Thanks for your response. What do you think is the reason why such GWAS tend to have inconsistent results and such very small effect sizes? I think I remember seeing whole genome analyses of people with autism that looked at copy number variations and found larger effect sizes that generally seen in GWAS. Of course the sample sizes where smaller (I think it was around 100 - 200) and I think I have seen genome analyses that found CNVs for other phenotypes that also had larger effect sizes. Are those artefacts of the methods and or smaller sample sizes or are the effect sizes larger because of the type of mutation or because of the method? I must confess that I am remembering papers and don't have them at hand. So I could be misremembering. I am wondering what the reasons could be behind the differences seen in GWAS vs other types of analyses with whole genome analyses and so forth. Do you have ideas on that?
I said in my recent article that as GWAS methodology improved and more and more gene variants were included, heritability estimates across the board ought to go up. This was the guess of someone who has no formal education in genetics. But what you seem to be saying is that, at least for mental traits like IQ, the higher-quality the GWAS, the lower the heritability found. So do you disagree with me? Do you think the ultimate conclusion of GWAS will be that IQ has a heritability of less than 20%? My article: https://open.substack.com/pub/eclecticinquiries/p/twin-studies-exaggerate-iq-heritability?r=4952v2&utm_campaign=post&utm_medium=web
I think there are two forces at play: as we add more genetic variation into GWAS analyses (specifically, rare variants which are the major class of variation not currently captured by GWAS) we should see the heritability estimate go up; as we understand the biases more and account for them, we should see the heritability estimate go down. In the case of height, as GWAS platforms have gotten better, the heritability estimate went from ~40% to ~45% with rare variants adding another 10-20%; with the Tan et al. paper showing that confounding on height is relatively minor. For IQ, the opposite has been true, as we've gotten better at isolating biases the estimates went from 50% (in one of the earliest studies, which had major stratification issues) to ~20% to ~12% direct with my analysis of the Tan et al. data; with the Tan et al. paper showing that confounding is a major issue. In contrast, the influence of rare variants on IQ (in the healthy population) has been negligible from the few studies that have analyzed it. So for IQ I think we are still on the "figuring out biases that inflate the estimate" end rather than the "finding new variants that deflated the estimate" end. As to the total heritability, I think the best estimate we have now is the ~12% estimate from the Tan et al. data with an additional ~1.5x boost from rare variants which gets you to (very crudely) ~18% or so.
Great article!
If two subjects of unknown relatedness have very different GWAS scores for a very complex trait involving thousands of SNPs, that would correlate with their being only distantly related. But with siblings the degree of relatedness is fixed, so for the same degree of difference in the GWAS score, their would be less chance that other genetic factors (SNPs not considered in the GWAs, SNPS effectiveness amplified by other SNPS) would also be different. So, these results do not mean that complex traits are not mostly heritable. It only means that we have not figured it all out .... yet.
Here is a post on my own substack about this: https://comment78.substack.com/p/bound-to-fail?r=3c6ol1
It annoys how often in scientific discussion, the issues of problems with measurement are avoided. The focus is all around, but almost never about the problems with measurement techniques and the sometimes serious limitations they embody. Heritability by way of SNP-GWAS is one of those. The geneticists at work tend to laugh at all those thousands of dollars/euros that are wasted on such extremely large samplesizes, but in the end, when you make the sample larger but don't do anything about the sensitivity of false positives in the methods, you only enlarge potential biases. Power of statistics is not everything that should be looked at.
What are your thoughts on the SNP chips often used for these GWAS. From my understanding of genetics, SNP tend to be the most common mutations but also the least impactfull. And also when using SNP's as proxies for genes you walk into a whole other ballpark of potential biases, which include known vs unknown genetics in the whole genome. SNP are by their definition in those chips only known SNP's which even bias the estimates even further. I generally do not trust SNP based GWAS, just because they are cheap to use does not mean they are good to use.
I'll need to write up a longer post on it but contemporary SNP array data followed by imputation tends to pick up effectively all common variants and is very difficult to outperform. Common SNP variants also tag common structural variants very well, so it is unlikely that there is untapped common SV signal out there (note: some of these associations could still be *driven* by SVs we haven't typed, but tagged very well by SNPs we did type). In terms of variation that is not captured by GWAS, I think it's either rare variants or interactions.
Oké, that is good news. But is a SNP array not a biased and incomplete perspective of the exome? I. E. It only contains SNP's that have garnered interest in the past? Of course when you gather enough of them the bias would eventually decrease. But any type of copy number variation will not reliably be measured, and neither does the the rest of the genome that is not exome. Which from what I remember contains a lot of structural information which indirectly influences gene expressions as well.
But like I said, I am not all that up to date when it comes to genome measuring tech. I just hope that GWAS will move on from SNP chips to whole genome. Where larger variations will also be captured. The ones that have been done are all of smaller samples which in itself is of course a problem. I'd like you perspective. And if you ever get to writing an article about it I would love to read it.
I don't think so. Modern SNP arrays are designed to be a comprehensive sampling of common variation *genome-wide*, often based on whole-genome data from many representative populations. Sometimes special exonic variants (or other ROIs) are added, but not at the cost of broad sampling. I don't know where the idea that they only/largely capture the exome comes from, it's just not the case and most GWAS heritability is non-coding.
It seems I got a bit outdated information and or misremembered a few things about SNP chips,ill check my sources on that. Thanks for your response. What do you think is the reason why such GWAS tend to have inconsistent results and such very small effect sizes? I think I remember seeing whole genome analyses of people with autism that looked at copy number variations and found larger effect sizes that generally seen in GWAS. Of course the sample sizes where smaller (I think it was around 100 - 200) and I think I have seen genome analyses that found CNVs for other phenotypes that also had larger effect sizes. Are those artefacts of the methods and or smaller sample sizes or are the effect sizes larger because of the type of mutation or because of the method? I must confess that I am remembering papers and don't have them at hand. So I could be misremembering. I am wondering what the reasons could be behind the differences seen in GWAS vs other types of analyses with whole genome analyses and so forth. Do you have ideas on that?