How does it make sense to speak of heritability of qualitative traits?

How does it make sense to speak of heritability of qualitative traits?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

As far as I understand, heritability is defined as "proportion of variation of a phenotypical trait due to genetic variation between individuals in a population".

I see the concept is being applied on quantitative that can be measured with numbers, can be compared (i.e., $mathrm{A}$ is taller than $mathrm{B}$ by $x$ units of height) and varies continuously to form a distribution (e.g IQ forms a Gaussian distribution).

heritability in the narrow sense ($h^2$) has a mathematical definition. For a given quantitative trait $x$ of a population that forms a distribution with certain average and certain variance $V_P$ then $$h^2 = dfrac{V_G}{V_P}$$ where $V_G$ is the variance in population due to variance in genetics and $V_P=V_G +V_E$ where $V_E$ is variance in population due to enviromental variation (I ignored other terms in the definition of $V_P$).

IQ has mean of 100 with $h^2=0.8$. So for a person whose IQ is 105 (5 points above average), genetic contribution contributed to this $0.8*5=4$ points difference and the remaining 1 point difference is environment.

I also see how the concept can be applied on qualitative traits (by this I mean traits that don't satisfy the properties of quantitative traits) even in the absence of well-defined notion of variance $V_P$ because the trait doesn't form a distribution. For example in a population of flowers that can only be red or blue.This red/blue trait does not have a well-defined $V_P$. We can still meaningfully say that this trait has $h^2=1$ since any variation in the color (red vs blue) is due to genetic variation. We can also speak meaningfully and say a certain qualitative trait has $h^2=0$ meaning any variation in the trait is due to different environments.

On the other hand, I think qualitative traits can have either $h^2=0$ or $1$ but no value in between, since $V_P$ is not well-defined (and hence $V_G$ and $V_E$ as well are not well-defined) for them.

Consider sexual orientation for example. Based on some twin studies it was estimated that homosexuality has $h^2=0.5$. But what does that mean really? how does homosexuality vary to form a variance? A person is either homosexual or not(heterosexual, bisexual or asexual).

My question is how to apply this $h^2=0.5$ value of homosexuality in the same way I applied it above on IQ?

More generally how to meaningfully be able to interpret heritiability of other qualitative traits (mental disorders for example like schizphrenia with $h^2=0.8$) that has has no well-defined $V_P$ and whose $h^2$ value is not $1$ or $0$ but rather lies between them?

The concept of heritability is a concept coming from quantitative genetics. It only applies to quantitative traits. This does not mean that the trait must be continuous. Discrete traits (such as the number of eyes for example) are quantitative traits.

For boolean traits, it is common to set one outcome to 0 and the other to 1. Which outcome is set to which value does not matter as it won't affect the variance. To take your example of homosexuality, one would classify all individuals as homosexual or heterosexual (with not in between or other variants) and set homosexual to 0 and heterosexual to 1 (or vice-versa) and compute the heritability from there.

As far as I know, for non-boolean nominal traits, the concept of heritability is undefined.

I think after some research I found the answer to the question. Measuring heritability of binary traits (e.g, diseases that do not conform to Mendelian rules like diabetes, mental disorders like schizophrenia and bipolar and etc) is based on the liability threshold model.

This theory assumes that every binary trait has underlying liability value that predisposes an individual for this trait. This liability varies continuously and forms a Gaussian distribution for a population. There's a threshold value of liability after which an individual becomes affected by this trait and it manifests itself (e.g, becomes schizophrenic). A person below the threshold value will be unaffected (e.g., does not have schizophrenia).

The theory posits that the source of variance $V_P$ of liability among the population is owing to genetic and environmental variances $V_G$ and $V_E$ respectively.

heritability as estimated by twin studies of monozygotic and dizygotic twins then estimates the heritability of the liability of a binary trait (e.g, homosexuality). In this case the formal definition of $h^2$ applies.

Beyond quantitative and qualitative traits: three telling cases in the life sciences

This paper challenges the common assumption that some phenotypic traits are quantitative while others are qualitative. The distinction between these two kinds of traits is widely influential in biological and biomedical research as well as in scientific education and communication. This is probably due to both historical and epistemological reasons. However, the quantitative/qualitative distinction involves a variety of simplifications on the genetic causes of phenotypic variability and on the development of complex traits. Here, I examine three cases from the life sciences that show inconsistencies in the distinction: Mendelian traits (dwarfism and pigmentation in plant and animal models), Mendelian diseases (phenylketonuria), and polygenic mental disorders (schizophrenia). I show that these traits can be framed both quantitatively and qualitatively depending, for instance, on the methods through which they are investigated and on specific epistemic purposes (e.g., clinical diagnosis versus causal explanation). This suggests that the received view of quantitative and qualitative traits has a limited heuristic power—limited to some local contexts or to the specific methodologies adopted. Throughout the paper, I provide directions for framing phenotypes beyond the quantitative/qualitative distinction. I conclude by pointing at the necessity of developing a principled characterisation of what phenotypic traits, in general, are.

This is a preview of subscription content, access via your institution.

Quantitative traits occur as a continuous range of variation. This means that these traits occur over a range. To picture this, imagine the length of a lizard's tail. The length can vary, and does not fit into natural categories. Generally, a larger group of genes control qualitative traits. When multiple genes influence a trait, you can also describe it as a "polygenic trait."

This concept may make more sense with examples. Some examples of qualitative traits include round/wrinkled skin in pea pods, albinism and humans' ABO blood groups. The ABO human blood groups illustrate this concept well. Except for some rare special cases, the humans can only fit into one of four categories for the ABO part of their blood type: A, B, AB or O. Since the ABO part of your bloodtype fits neatly into four categories, it is a qualitative trait. You can often represent qualitative traits with a number.

Neurodevelopmental Disorders


Heritability estimates for ASD range from 37% to 92%, based on twin concordance rates. For dizygotic twins the concordance for ASD is up to 10%. Advanced parental age, low birth weight, and in utero exposure to valproate, thalidomide, and misoprostol have been associated with increased risk of developing ASD. While the cause of ASD for the majority of individuals is unknown, genetic causes can be identified in 15–25% of children with ASD. Identified genetic causes may be classified as microscopically visible chromosome abnormalities (2–5% these have been identified on almost every chromosome, but 15q duplications are among the most commonly observed), submicroscopic deletions and duplications (10–20% most commonly involving chromosomes 15q11-q13 and 16p11.2), and single gene disorders (5% most commonly Fragile X syndrome, Rett syndrome, tuberous sclerosis, and PTEN mutations).

Reviewing the Literature: Is Our Genetics Our Destiny?

Over the past two decades scientists have made substantial progress in understanding the important role of genetics in behaviour. Behavioural genetics studies have found that, for most traits, genetics is more important than parental influence. And molecular genetics studies have begun to pinpoint the particular genes that are causing these differences. The results of these studies might lead you to believe that your destiny is determined by your genes, but this would be a mistaken assumption.

For one, the results of all research must be interpreted carefully. Over time we will learn even more about the role of genetics, and our conclusions about its influence will likely change. Current research in the area of behavioural genetics is often criticized for making assumptions about how researchers categorize identical and fraternal twins, about whether twins are in fact treated in the same way by their parents, about whether twins are representative of children more generally, and about many other issues. Although these critiques may not change the overall conclusions, it must be kept in mind that these findings are relatively new and will certainly be updated with time (Plomin, 2000).

Furthermore, it is important to reiterate that although genetics is important, and although we are learning more every day about its role in many personality variables, genetics does not determine everything. In fact, the major influence on personality is nonshared environmental influences, which include all the things that occur to us that make us unique individuals. These differences include variability in brain structure, nutrition, education, upbringing, and even interactions among the genes themselves.

The genetic differences that exist at birth may be either amplified or diminished over time through environmental factors. The brains and bodies of identical twins are not exactly the same, and they become even more different as they grow up. As a result, even genetically identical twins have distinct personalities, resulting in large part from environmental effects.

Because these nonshared environmental differences are nonsystematic and largely accidental or random, it will be difficult to ever determine exactly what will happen to a child as he or she grows up. Although we do inherit our genes, we do not inherit personality in any fixed sense. The effect of our genes on our behaviour is entirely dependent on the context of our life as it unfolds day to day. Based on your genes, no one can say what kind of human being you will turn out to be or what you will do in life.

Key Takeaways

  • Genes are the basic biological units that transmit characteristics from one generation to the next.
  • Personality is not determined by any single gene, but rather by the actions of many genes working together.
  • Behavioural genetics refers to a variety of research techniques that scientists use to learn about the genetic and environmental influences on human behaviour.
  • Behavioural genetics is based on the results of family studies, twin studies, and adoptive studies.
  • Overall, genetics has more influence than parents do on shaping our personality.
  • Molecular genetics is the study of which genes are associated with which personality traits.
  • The largely unknown environmental influences, known as the nonshared environmental effects, have the largest impact on personality. Because these differences are nonsystematic and largely accidental or random, we do not inherit our personality in any fixed sense.

Exercises and Critical Thinking

  1. Think about the twins you know. Do they seem to be very similar to each other, or does it seem that their differences outweigh their similarities?
  2. Describe the implications of the effects of genetics on personality, overall. What does it mean to say that genetics “determines” or “does not determine” our personality?


The role of epistasis in the genetic architecture of quantitative traits is controversial, despite the biological plausibility that nonlinear molecular interactions underpin the genotype–phenotype map. This controversy arises because most genetic variation for quantitative traits is additive. However, additive variance is consistent with pervasive epistasis. In this Review, I discuss experimental designs to detect the contribution of epistasis to quantitative trait phenotypes in model organisms. These studies indicate that epistasis is common, and that additivity can be an emergent property of underlying genetic interaction networks. Epistasis causes hidden quantitative genetic variation in natural populations and could be responsible for the small additive effects, missing heritability and the lack of replication that are typically observed for human complex traits.


Department of Biology, Centre for Ecological and Evolutionary Synthesis, University of Oslo, PB1066, Blindern, 0316, Oslo, Norway

Department of Biology, Centre for Conservation Biology, Norwegian University of Science and Technology, 7491, Trondheim, Norway

Department of Biological Science, Florida State University, Tallahassee, FL, 32306, USA

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

Corresponding author

Materials and methods

Organism and experimental protocol

Gryllus firmus is a relatively large (≈0.7 g) wing-dimorphic cricket that typically inhabits sandy sites along the eastern seaboard of the United States ( 1 13 ). The individuals used in the present experiment were from a stock culture that originated from ≈20 males and 20 females collected in northern Florida in 1981. The stock culture is maintained with a standing adult population of several hundred individuals. To prevent diapause the temperature is maintained in excess of 25 °C. Nymphs and adults in both the stock and the selection experiments were fed Purina rabbit chow.

The selection protocol followed a standard mass selection design with two lines selected for increased fecundity, two lines selected for decreased fecundity and four control lines. The ‘high fecundity’ and ‘low fecundity’ selection experiments were initiated at different times and followed slightly different protocols, the most important difference being in the definition of fecundity in the two experiments. From previous experiments ( 32 41 ) it has been shown that total fecundity, measured as eggs laid in the first week after eclosion plus the fully formed eggs in the ovaries, has a higher heritability and is more normally distributed than the number of eggs laid. We therefore selected on total fecundity in the following manner. Adult females were separated into individual cages, each with a mate, food, water and an oviposition dish, and the number of eggs laid in the first 7 days post eclosion measured. The females were then killed and the number of eggs in the ovaries counted. The eggs laid by the 25 females with the highest total fecundity were used to form the next generation, each female contributing the same number of offspring.

The above protocol could not be followed when selecting for reduced fecundity because in this case the required females typically did not lay any eggs in the first week after eclosion, and thus had to be retained for several weeks to obtain the required eggs. For the low fecundity lines we therefore selected on eggs laid in the first week, the 25 females laying the lowest number of eggs being used as parents of the next generation. Because of the different protocols the experiment is not strictly a bidirectional selection experiment: to avoid confusion we shall refer to the high fecundity line as ‘high total fecundity line’ and the low fecundity line as ‘low eggs laid line’

To initiate the ‘high total fecundity’ experiment 25 adult females were chosen haphazardly from the stock culture and from each female 32 nymphs obtained. The 800 nymphs were distributed among 16 cages, each cage receiving two nymphs from each female (giving 50 nymphs per cage): this generation is designated as generation 1. Upon final eclosion, 100 first-generation females were selected at random and the 25 females with the highest fecundity were used to initiate two high total fecundity selection lines. Each female contributed 24 nymphs to each of the two selection lines, these lines being thereafter treated independently. In each subsequent generation the fecundities of 100 randomly selected females in each line were measured and the 25 females with the highest fecundities selected as parents for the next generation. A minimum of 12 cages per line were used for each generation, each female contributing equally to each cage (50 nymphs per cage). Two control lines were established using the same protocol as above except that females to be used as parents were selected at random. Due to an error the total fecundities of the two control lines were not measured in the second generation.

The low eggs laid selection experiment followed the above protocol except that the two replicates (selection line and control line) were initiated 3 weeks apart. In this case each control line can be matched with a particular selection line. Because counting eggs alone is much less labour intensive than measuring total fecundity it was possible to count the eggs laid by all adult females. We still selected the lowest 25 females which meant that in some generations the selection intensity was modestly higher than that used in the high total fecundity selection experiment.

Statistical analyses

There are three primary questions to be addressed: (1) Is there a direct response to selection? (2) What is the realized heritability of fecundity? (3) Is there a significant correlated response of wing morph frequency for both the high fecundity and low eggs laid selection experiments?

To address the first question we used two approaches. First, we used ANCOVA to analyse the change in fecundity over the generations of selection. To remove changes not directly associated with selection we subtracted the control means from the selected lines. In the case of the high total fecundity lines, because there is not a one to one match between control and selected lines, we subtracted the average of the two control lines

where F * i,j is the adjusted mean total fecundity of the ith selected line (i = 1,2) in the jth generation, Fi,j is the mean total fecundity of the ith selected line in the jth generation, and xk,j is the mean fecundity of the kth control line (k = 1, 2) in generation j. Because of a lack of the correction for the control line, generation 2 was excluded from the analysis of the high total fecundity lines. For the low eggs laid line the adjusted fecundity was calculated by subtracting the mean value of the appropriate control line.

The second method of testing for a direct response to selection consisted of comparing the fecundity of control and selected-line females in the final generation. For the high total fecundity experiment we gave each line a unique designator and then tested for significant variation among the means with a one-way ANOVA , following this with a Tukey HSD multiple comparison test to locate the sources of variation. For the low eggs laid selection experiment we used a two-way ANOVA because control and selected lines could be paired.

We estimated the realized heritability as twice the slope of the regression forced through zero of the adjusted response on the adjusted cumulative selection differential (the slope is multiplied by two because selection is applied only to females). In the high total fecundity line we included the second generation by making the conservative assumption that there was no change in total fecundity in the selected lines. The standard error of the heritability was estimated using the formula given by 14 ) which takes into account drift,

where L and N are the effective number of parents used per generation in the control and selected lines (= twice the number of actual individuals, because each female contributes the same number of offspring), respectively K and M are the effective number of individuals measured per generation in the control and selected lines, respectively t is the number of generations of selection VP is the phenotypic variance Scum is the adjusted cumulative selection differential.

As a second method of estimating the standard error we computed the standard error using the estimates of realized heritability from the two selection lines ( 12 , p. 211):

To analyse the correlated response of wing morph frequency we used the proportion of macropterous females per cage as individual data points. The correlated response should be measured as the change in the liability, which can be estimated as the abscissa of the standard normal curve corresponding to the observed proportion macroptery ( 32 ). However, in some cases the proportion macroptery was zero, precluding the use of this transformation. For the purposes of testing for a deviation between control and expected lines we therefore used the arcsine square-root transformation of the individual proportions. This procedure is valid for testing the deviation but gives only a crude measure of the actual shift in the liability.

We tested for a change in proportion macroptery as a function of treatment (selected or control) and generation. Because there is a nonlinear relationship between the value of the liability and the observed proportion we could not adjust the values in the selected lines by subtracting the control line values. Instead, after finding no significant variation between replicate lines within each experiment, we used the mean transformed values per generation, Ptrans, in the covariance model,

where t is generation, TREAT is a categorical variable designating treatment (selected or control), and c0, …, c3 are fitted constants.


The genetic components of complex human traits and diseases arise from hundreds to likely many thousands of single nucleotide polymorphisms (SNPs) [1], most of which have weak effects. As sample sizes increase, more of the associated SNPs are identifiable (they reach genome-wide significance), though power for discovery varies widely across phenotypes. Of particular interest are estimating the proportion of common SNPs from a reference panel (polygenicity) involved in any particular phenotype their effective strength of association (discoverability, or causal effect size variance) the proportion of variation in susceptibility, or phenotypic variation, captured additively by all common causal SNPs (approximately, the narrow sense heritability), and the fraction of that captured by genome-wide significant SNPs—all of which are active areas of research [2–9]. The effects of population structure [10], combined with high polygenicity and linkage disequilibrium (LD), leading to spurious degrees of SNP association, or inflation, considerably complicate matters, and are also areas of much focus [11–13]. Despite these challenges, there have been recent significant advances in the development of mathematical models of polygenic architecture based on GWAS [14, 15]. One of the advantages of these models is that they can be used for power estimation in human phenotypes, enabling prediction of the capabilities of future GWAS.

Here, in a unified approach explicitly taking into account LD, we present a model relying on genome-wide association studies (GWAS) summary statistics (z-scores for SNP associations with a phenotype [16]) to estimate polygenicity (π1, the proportion of causal variants in the underlying reference panel of approximately 11 million SNPs from a sample size of 503) and discoverability ( , the causal effect size variance), as well as elevation of z-scores due to any residual inflation of the z-scores arising from variance distortion ( , which for example can be induced by cryptic relatedness), which remains a concern in large-scale studies [10]. We estimate π1, , and , by postulating a z-score probability distribution function (pdf) that explicitly depends on them, and fitting it to the actual distribution of GWAS z-scores.

Estimates of polygenicity and discoverability allow one to estimate compound quantities, like narrow-sense heritability captured by the SNPs [17] to predict the power of larger-scale GWAS to discover genome-wide significant loci and to understand why some phenotypes have higher power for SNP discovery and proportion of heritability explained than other phenotypes.

In previous work [18] we presented a related model that treated the overall effects of LD on z-scores in an approximate way. Here we take the details of LD explicitly into consideration, resulting in a conceptually more basic model to predict the distribution of z-scores. We apply the model to multiple phenotype datasets, in each case estimating the three model parameters and auxiliary quantities, including the overall inflation factor λ, (traditionally referred to as genomic control [19]) and narrow sense heritability, h 2 . We also perform extensive simulations on genotypes with realistic LD structure in order to validate the interpretation of the model parameters. A discussion of the relation of the present paper to other work is provided in the first section of the S1 Appendix (pp. S2-S3).

How to calculate heritability

Heritability is the proportion of variance in a particular trait, in a particular population, that is due to genetic factors, as opposed to environmental influences or stochastic variation.

That’s just a general definition to give you a feel for it. Actually we need to be more rigorous than that. There are two definitions of heritability. A common simplification in all sorts of genetic studies and models is to assume that all alleles and all genotypes act independently of each other – this is called an ‘additive model.’ So for instance, if one allele of a particular SNP gives you a 1 cm increase in height, then being homozygous for that SNP should give you a 2 cm increase in height. Clearly, this model doesn’t allow for dominant or recessive effects, even though we know these abound. It also doesn’t allow for gene-gene interactions, where maybe that SNP only gives you a 1 cm increase in height if paired with another SNP. For these reasons, the additive model is a huge simplification, but a useful one. Now for the two definitions of heritability:

  • ‘narrow sense heritability’ (h 2 ) is defined as the proportion of trait variance that is due to additive genetic factors
  • ‘broad sense heritability’ (H 2 ) is defined as the proportion of trait variance that is due to all genetic factors including dominance and gene-gene interactions.

Both kinds of heritability are incredibly tricky to estimate and to interpret. In terms of estimation, a big problem is that people who share parts of their genome tend to share parts of their environment too. One simple way you might think to estimate heritability is to plot children’s traits against the average of their parents, as shown in this example from Visscher 2008:

In the example above, the slope is taken to be the heritability. The problem with this is that parent and child share a lot else besides half their genome.

One approach to calculating heritability which largely avoids the confounding of genotype with shared environment is to compare the phenotypic concordance of monozygotic (MZ, identical) twins versus dizygotic (DZ, fraternal) twins. Both types of twins are expected to share virtually all environmental factors, including while in the womb, which is why this is a better study design than just comparing MZ twins to siblings. Comparing MZ to DZ twins lets you isolate the contribution of that marginal half shared genome to phenotypic concordance.

Visscher 2008, citing Deary 2006 (ft), discusses the example of IQ, where MZ twins have concordance of .86 and DZ twins have concordance of .60. ”Concordance” in these studies seems to refer to a Pearson’s correlation or similar, so something like an r or ρ. (Why r and not slope, like in parent-offspring regression? See this post for further discussion).

At first glance it is not clear how to convert these numbers – .86 and .60 – into an estimate of heritability. After all, both of these figures include both genetic and environmental factors. The key observation is that sharing a marginal half genome with your twin explains an additional .86-.60 = 26%, so in theory, sharing a full genome explains 2*26% = 52%.

I wrote “heritabiliity” on the left side of the equation, instead of h 2 or H 2 , because it is debatable what this estimate is really reflecting. The wiki on Falconer’s formula claims that it estimates H 2 , broad-sense heritability. Indeed: since MZ twins share virtually all their genotypes (there will be just a few chance mutations here and there that make them differ), they share dominant / recessive effects and gene-gene interactions, which DZ twins are not expected to share. Yet the notion that you can just double the 26% marginal variance explained by a half genome in order to extrapolate to a whole genome seems to assume all additive effects. The 26% marginal variance explained from sharing a whole genome as opposed to a half genome might in fact be partitioned between additive effects (which we could fairly double to extrapolate to a whole genome), gene-gene interactions (which we should perhaps multiply by 4/3, since DZ twins sharing half their genome only share 1/4 of their possible gene-gene pairings, so the MZ twins are capturing an extra 3/4) and dominance effects (which we might also argue to multiply by 4/3 since DZ twins share both alleles at only 25% of sites, so again, MZ twins add a marginal 3/4).

Accordingly, though the wiki on Falconer’s formula claims it calculates H 2 , the wiki on twin studies claims it estimates h 2 . To my view, it’s not a perfect estimate of either of these. And frustratingly, grand sweeping reviews of the concept of heritability (such as Visscher 2008) are long on talk and short on formulae.

I also ran across an old school paper by Jacquard 1983 (ft) which presents a formula something like this:

So for the IQ example, heritability = (.86-.60)/(1-.60) = .26/.40 = 65%. This will often give pretty different answers than Falconer’s formula I don’t quite understand the logic of it, though one nice property of it is that it never rises above 100%. But I can’t find any evidence that this alternative formulation is still in use today.

Besides the H 2 vs. h 2 debate, there are other conceptual issues with calculating heritability from twin studies as well. For instance, MZ twins may actually share more environmental factors than DZ twins since being similar makes people treat them similarly. It’s entirely possible to find that ρMZ - ρDZ > 50%, in which case your estimate of heritability will be > 100%. Oops. Also, it is assumed that DZ twins share exactly half their genome, but in fact, due to random segregation of alleles, there is variance in what fraction of alleles siblings actually share – more on this shortly.

There are plenty of other study designs as well. Whereas MZ vs. DZ twin studies look at the effect of sharing 100% IBD instead of

50%, sibling vs. adopted sibling studies look at the effect of

50% IBD instead of 0% IBD, while theoretically controlling for shared environment (though obviously it can’t account for factors in the womb). Here, h 2 = 2(ρsib - ρadoptee) (*The Falconer’s Formula wiki says not to double this quantity, i.e. h 2 = ρsib - ρadoptee – this seems incorrect to me, but if you know otherwise, please leave me a note)

Twin pairs and sibling/adoptee pairs are all well and good when you’re dealing with a trait like height, which you can measure for absolutely anyone. But consider the phenotype of residual age of onset in Huntington’s Disease. This phenotype can only be assessed for people who have HD, which is already a very rare disease if you were to also limit yourself to twin pairs you’d have an n pretty close to zero. In that case, you’ll take what you can get, such as correlation between sibling pairs: 2*rsib is an upper limit for heritability. It might be a pretty loose upper bound, and if rsib > 50%, then it’s no upper bound at all. If you have other relationships in your dataset as well – parent-offspring pairs, avuncular pairs, cousin pairs – then you can try to make a few more inferences, though it is pretty hard to disentangle genes and environment because unlike in the MZ/DZ and sibling/adoptee comparisons you don’t have any pairs of pairs where environment is shared equally between the two pairs but genotype is shared unequally. In U.S.–Venezuela Collaborative Research Project 2004, the most oft-cited study of heritability in HD age of onset, the siblings had concordance of .42 (suggesting an upper bound for heritability of 84%) while parent-offspring were only .10, avuncular .07 and cousin .15 [see Table 4]. Under an additive model (narrow-sense heritability), the parent-offspring correlation would suggest heritability of no more than 2*.10 = 20%, the avuncular would suggest 4*.07 = 28%, and the cousins no upper limit at all because 8*.15 > 100%. In short, the data are all over the place. The authors assumed that only siblings share an environment, and then used some model (details never stated see my commentary here) to integrate all these pieces of information into a single estimate of 38% heritability. This should probably be interpreted as a pretty rough estimate.

Here’s a thought exercise: suppose the Venezuelan HD pedigree has some consanguinity, which means that relatives often share more IBD than their nominal relations to each other would suggest. Does that bias the heritability estimate? I am still undecided on my answer. At first I thought the answer was no, because if you think of heritability as (variance explained by genes) / (total variance), then both the numerator and denominator are affected by consanguinity. Yes, first degree relatives share ‘extra’ IBD and so correlate better than they ‘should’, but so does everyone in that dataset. However, Visscher 2006 presents formulas for controlling for parent inbreeding, implying consanguinity does matter. Leave me a comment if you have the answer.

In talking about consanguinity, my concern is with excess IBD. But you might also ask whether excess identity-by-state (IBS) matters for heritability calculations. After all, even if you only look at SNPs that are polymorphic within my ethnic group, I’m still going to share plenty of alleles with any other random person just by chance. For a C/T SNP with minor allele frequency 50%, there are three possible genotypes CC, CT and TT, and so me and some random person will have a 50% chance (.25^2 + .5^2 + .25^2) of sharing a genotype and a 87.5% chance of sharing at least one allele (1-2*.25^2). And most SNPs are relatively uncommon, with an average minor allele frequency around 10 – 15% in many studies, which makes those odds even higher. Accordingly I’ll also share way more than half my alleles with my sibling just by chance. So does that mess up the heritability calculations? Again, since it affects both the numerator and denominator – you have extra IBS with your siblings and with random people in the population – I believe the answer should be no.

However, leaving consanguinity behind now, the fact is that different sibling pairs do share different amounts of IBD, and different unrelated individuals do share different amounts of IBS. This variability has enabled a couple of very cool modern approaches to calculating heritability.

The first of these is sibling IBD regression. Visscher 2006 presents an excellent (and perhaps the first with any considerable sample size?) analysis of heritability of height using this approach. Due to random segregation of parental alleles, siblings don’t always share exactly 50% of alleles IBD. The mean is 50%, standard deviation ± 4% – a fair number of sibling pairs share as little as 40% of alleles or as many as 60%, as shown in Visscher’s histogram from Figure 1:

The fact that some siblings are more similar to each other than other pairs are – yet we assume they all have an equal degree of shared environment – gives us a new way to estimate heritability, while controlling for environment, without having twins available. That’s pretty cool! Visscher’s formulas are under Materials and Methods there is a lot of fancy stuff you’ll need to know to implement it and estimate standard error, correct for inbreeding if present, etc., but the core concept is just to regress siblings’ genotypic concordance (% shared IBD) against their phenotypic concordance. This is done as follows (these formulas assume exactly 2 sibs per family):

  • Let Yi1 be the (quantitative) phenotype of sibling 1 in family i
  • Let πi be the percent IBD between siblings 1 and 2 in family i
  • Let σ̂p 2 be the estimate of total phenotypic variance in the population
  • α and β are parameters to be estimated
  • ĥ 2 will be your estimate of the additive genetic (narrow sense) heritability
  • Use this formula to estimate β: (Yi1-Yi2)2 = α + βπi
  • Then plug your estimated β̂ into this formula: ĥ 2 = β̂/(2σ̂p 2 )

The Achilles heel of this approach, as Visscher points out, is that the standard errors are really high. That’s because the range of sibling IBD is relatively narrow (not many sib pairs outside the .4 to .6 range). Visscher’s simulations suggest that when the true h 2 is 0.8 and the sample size is 2500 sib pairs, the standard error of h 2 is 0.2. That gives you a pretty wide range, but I’d point out that different studies give widely differing estimates of heritability anyway, at least for some traits (for instance Visscher 2008 cites IQ heritability estimates ranging from 0.5 to 0.8). Estimating heritability, even in the best of circumstances, is not an exact science. Visscher’s estimate of height heritability obtained via sibling IBD regression is 0.8, which is consistent with the estimates obtained by other methods.

If you can exploit the variation in IBD among siblings to estimate heritability, why not exploit the variation in IBS among unrelated members of the general population? Even nominally unrelated individuals will vary in how many alleles they happen to share, and you can measure this using SNP chip genotyping data. For any trait believed to have complex genetic etiology – many loci each contributing small effects – more genotype sharing should mean greater phenotypic concordance. This, to my understanding, is the principle behind Visscher’s latest tool, GCTA [Yang 2011]. It’s offered as a Unix command line tool that you can run out of the box with PLINK pedigree files haplotyped using MaCH. I don’t fully understand all the math yet – I’ll post an update if I get my head around it – but I believe the basic principle is regressing unrelated individuals’ genotypic concordance against phenotypic concordance. Because unrelated individuals don’t share a household environment, you again have a sort of ‘control’ that lets you begin to separate out the effects of environment vs. genetics. Admittedly, it gets messy if you consider that some genotypes correlate with some environmental factors, etc. – e.g. SNPs that predispose you to smoking, which you of course inherited from your parents, mean you’re more likely to have grown up with second-hand smoke in the house. A couple of caveats are that (1) this only works well for common variations, since rare variations are less well tagged by SNPs on your SNP chip, and (2) because the level of genotypic concordance among unrelateds is so much smaller and less variable than between siblings, the standard errors are even higher than for sibling IBD regression. So you need huge sample sizes. Still, this is pretty cool stuff.

But: don’t be fooled by all this fancy math into thinking that the genetics field is super advanced and sophisticated on precisely calculating heritability. There are a ton of issues with how to interpret heritability estimates. Visscher 2008 does a good job of addressing these. One important point is that heritability depends on the estimate of phenotypic variance in a particular population at a particular moment in time. Americans today are both taller and more obese than their ancestors 100 years ago, even though (at a population level, within ethnic groups, and to a first approximation, etc. etc.) their genes haven’t changed. We think height is about 80% heritable [Visscher 2008], but that’s just under today’s conditions – if you compared height across the whole of human history, you would be adding a ton of additional non-genetic variance, and the proportion explained by genetics – the heritability – would accordingly shrink. So just because a trait is highly heritable doesn’t mean it’s genetically deterministic.

A gross estimate of heritability also tells you nothing about the architecture of heritability. A trait that is 80% heritable could be caused by one locus that explains 80% of variance, or 80 loci that each explain 1% of variance. So just because a trait is highly heritable doesn’t mean there will be any individual genetic variants of large effect size.

An additional challenge in interpreting heritability estimates is that economic incentives bias which figures get reported. For any given trait, there will be a range of different estimates of heritability in the literature – say 0.5 to 0.8 – and even within any one study, there will probably be a range of possible estimates depending on the exact methodology chosen. In general, the highest estimate will be the one that researchers prefer to cite, because high heritability means justification for grant applications to fund GWAS and sequencing projects to identify the genes that drive heritability. So part of the ‘missing heritability’ probably lies in the fact that, for a huge range of human traits, the estimates of heritability that we hear most often are a bit, well, optimistic.

About Eric Vallabh Minikel

Eric Vallabh Minikel is on a lifelong quest to prevent prion disease. He is a scientist based at the Broad Institute of MIT and Harvard.

Watch the video: Are Your Traits Dominant? (January 2023).