The probability rule we used in another module to predict genotype frequencies in the offspring generation of a specific population:
can be use to generate a general formula for doing the same thing. That is, we can create a formula that describes how frequently particular genotypes will appear in the offspring generation when a population is not subject to an agent of evolution. This formula is known as the Hardy-Weinberg equation.
To do this, we will first generate the individual elements of the Hardy-Weinberg equation.
As the boxed rule above says, offspring genotype frequencies are calculated using parental allele frequencies. So imagine a population that has only two alleles for a given locus, A and a, and that in this population
To make sure you understand these phrases, substitute a number for p or for q. For example, p might be 0.4 meaning that the A allele occurs in 40% of the population's loci for this gene.
Notice that, because only two alleles exist in the population for this locus, the frequencies of the A and a alleles, p and q respectively, must sum to 1, the equivalent of 100%. Or
Also notice that, if we know the frequency of only one of the two alleles in a population, we can use simple algebra to work out the frequency of the second. For example, if we know p, the frequency of the A allele, then
And, of course, if we know q, the frequency of the a allele, then
To confirm your understanding of this relationship between the frequencies with which two alleles occur in a population when only two alleles exist for a given locus, answer the following questions.
Please explain in your own words why p + q must always equal 1 when only two alleles exist in a population for a given locus.
An example answer: I imagine a locus as slots for alleles. In diploid organisms, each individual will have two 'slots' or two loci for the particular gene of interest, one per chromosome. If only two alleles exist to fill every slot in this population, the slots that are not filled with one of those two alleles must be filled with the second. Consequently, if 50% of the slots are filled with A alleles, then the remaining 50% must be filled with a alleles to account for 100% of the population's loci. Because 50% is equivalent to a frequency of 0.5, then the frequency of the A allele or p equals 0.5 as does the frequency of the a allele or q so that p + q = 1.
Let’s consider a real example of this. In 2005, Stefasson et al. reported the fascinating discovery of an allele in humans whose presence is associated with increased fertility in Icelandic and European populations. Females with at least one copy of the allele have approximately 3.5%, and males 2.9%, more children on average than non-carriers. The exact mechanism by which the allele, known as H2, affects fertility is unknown.
If we know that 21% of European loci for this gene house the H2 allele, then how frequently must the single alternative allele, H1, for that locus occur in this population? Why?
If 21% of the loci (equivalent to a frequency of 0.21) in a population contain the H2 allele and H1 is the only other possible allele for this locus, then 79% of the remaining loci (equivalent to a frequency of 0.79) must have this allele. No other alleles exist for this locus consequently, if H2 does not occur at a locus then H1 must be there instead.
A colleague determines that the B1 and B2 alleles of the B locus both occur with a frequency equal 0.45. Surprised, she redoes her work and confirms her results.
a. What could be the cause of your colleague's surprise? Please explain.
b. Because your colleague confirms her results she now needs to explain them. She turns to you for assistance. What do you suggest? Please be sure to explain how your explanation accounts for her observations.
Your colleague was probably surprised because she thought that B1 and B2 were the only two alleles that occurred at this locus in this population. Consequently, the discovery that their frequencies, p and q, summed to 0.9 as opposed to 1 was startling. Your suggestion to look for at least one additional allele to account for the 10% of the alleles unaccounted for in her study is well taken. She realizes that the existence of one or more additional alleles would explain the missing 10% and enable her to bring the summed allele frequencies for the B locus to 1.
Now that we have designated p to represent the freqeuncy of A allele and q, the a allele, we are ready to move forward with our efforts to construct the elements of the Hardy-Weinberg equation. Imagine that every individual in a population, in which both copies of both the A and a allele occur, is equally likely to survive and to reproduce.
What possible genotypes could occur in the offspring of this population?
To answer this question, determine all the possible genotypes that could be formed from a population of individuals whose loci collectively warehouse numerous copies of A and a alleles. Remember that, because these individuals are all equally likely to reproduce, all combinations of these two alleles have the potential to form. Visit this module if you have questions.
There are four possible genotypes:
Now that we know what genotypes could form, we can use the rule highlighted at the very beginning of this module to predict how frequently each of these genotypes will appear in the offspring generation.
What are these frequencies? Apply the highlighted (boxed) rule above to complete the phrases below using the symbols p and q.
If all individuals are equally likely to survive and to reproduce, then the
Because the Aa and aA genotypes are genetically equivalent, we can summarize the relationships you articulated above as
And there you have it, the three fundamental elements of the Hardy-Weinberg equation that describe how frequently the three possible genotypes will appear in the offspring generation of a population that is not subject to an agent of evolution! Remember that only three genotypes are possible because we are only working with a gene for which only two alleles exist in a population.
To test your understanding of these relationships, answer the following questions.
Please explain in your own words what these three formulae tell us about the relationship between allele frequencies in the population and genotype frequencies in the offspring generation when all individuals are equally likely to survive and to reproduce.
In plain English, these three relationships tell us that
1. If we want to know the frequency of the homozygous genotype (AA or aa) in the offspring of a population in which all individuals are equally likely to survive and reproduce, then we simply square the frequency with which the appropriate allele (A or a) occurs population.
2. If we want to know the frequency of the heterozygous genotype (Aa) in the offspring of a population in which all individuals are equally likely to survive and reproduce, then we multiple the frequency with which each allele (A and a) occurs in the population and multiply this result by 2.
Return to the scenario described in problem 2. How frequently do you expect to the H1H1, H1H2, and H2H2 genotypes to appear in Europeans if the population is not evolving with respect to this allele?
To solve this problem, review the section above and generate a list of the information you need to generate and describe how you plan to get it.
Check your outline by answering the questions below.
1. How frequently do the H1 and H2 alleles occur in this population? This can be found in solution to problem 2.
2. Calculate the expected frequency of the H1H1, H1H2 and H2H2 genotypes in offspring of this population. To do this, square the frequency with which the H1 allele occurs in the population (p2), multiply the frequency with which the H1 allele occurs with the frequency with which the H2 allele occurs and multiply this result by two (2pq), and finally square the frequency with which the H2 allele occurs in the population (q2).
The formulae generated above - p2, 2pq and q2 - constitute the fundamental components of the Hardy-Weinberg equation. Thus, they describe the genotype frequencies you will see in a population, with respect to a single locus with only two alleles, if the population is not subject to any agent of evolution. That is, all individuals in the population are equally likely to survive and to produce offspring that survive.
Because together these formulae account for 100% of the genotypes this population could produce, they can be summarized and are often written in the following way:
In words, this equation says that the values, p2, 2pq and q2, which describe the frequency with which the AA, Aa and aa genotypes occur respectively, sum to 1.
Importantly, and as you applied it in the previous section, the Hardy-Weinberg equation is not necessarily used in the form in which it is written above. That is, you do not set the equation equal to 1 and solve for an unknown. Rather the individual elements p2, 2pq and q2 along with the relationship p + q = 1 are used as needed to solve problems.
Interestingly, the Hardy-Weinberg equation was actually formulated and published independently by both the British mathematician G. H. Hardy and the German physician cum geneticist W. Weinberg in 1908. Because Weinberg published in native German, however, his contribution was not recognized until 1943 at which point the principle was renamed to recognize both contributions.