The (simple) intuition behind evolution and Hardy-Weinberg

02 Sep 2018


As the new academic season starts up in the Palo Alto school district, I have received a new cohort of bright-eyed AP Biology students. For many of them, evolution is the first topic being covered. While this is not the order I would have done myself, I can certainly understand the reasoning to learn about evolution first. After all, nothing in biology makes sense except in the light of evolution (incidentally, Dobzhansky’s 1973 paper on this is so scientifically significant that it has earned its own Wikipedia page).

Much of biology taught at the high-school level (including AP) follows the same exact curriculum no matter where you are. Of course, anyone teaching AP Biology must adhere to the AP curriculum, and practically everyone uses the famous “Campbell textbook”; as a result, AP biology is taught with a low variance. Unfortunately, however, this does mean the weak spots of the curriculum are propagated repeatedly to almost every student. In particular, when it comes to the basic theory of evolution, the “standard” curriculum does not do it justice. Granted, this is an anecdotal observation, based on a relatively small sample size consisting of a handful of schools here in the California Bay Area. It is largely agreed, however, that the science education here is quite good, in terms of both content and pedagogy. It is likely that this observed shortcoming, then, is at least somewhat applicable to many other schools.

It is something I notice time and time again in my students, and something I went through myself. It seems that whenever students learn about the forces of evolution, they are presented with a list of terms to memorize. This approach lacks the glue that holds the theory together, and that’s a downright shame. Evolution is a theory. It is the network that holds all biology together, and no other aspect of the biological sciences can claim to be quite as ubiquitous and critical. This is doubly unfortunate because many high-school students are certainly able to understand and appreciate it (perhaps with some effort). At the very least, my students do, and they are much better off for it (or so they claim).

I present here how I prefer to teach the forces of evolution.

The four forces of evolution

What is evolution?

Evolution” is often tossed around colloquially to mean something akin to “the rise of new species”, but it actually has a very particular definition in biology. “Evolution” refers to the change in allele frequency in a population over time.

There are a couple of pieces to this definition. Firstly, a population is a group of individuals of the same species that freely interbreed and produce offspring. That is, two individuals are in the same population if they have a fairly significant chance of producing viable offspring. For an asexual species, a population is essentially just a group of individuals living and interacting together.

If we think of a locus (i.e. location) in a species’ DNA (whether that be a single base, a small region, or a gene), an allele is a version of that locus. The frequency of an allele is how often it appears in the population.

For example, consider the following population of humans. Let us focus on a particular locus (location) in the DNA, where there are two versions of the DNA. One is called D and the other is called d. Since humans are diploid, each individual has two alleles at this locus:

If we count the alleles (each individual has two), we see there are 6 occurrences of D and 4 occurrences of d. Then the allele frequencies are 60% D and 40% d.

As time passes, individuals mate with each other and product offspring, and this next generation mates and produces offspring, and so on and so forth. After some generations have passed, the population may look like this:

Now, the allele frequencies are 50% D and 50% d. The allele frequencies at this locus have changed, so this population has evolved.

It is important to note that changes allele frequency happen over at least one generation. Remember, populations evolve, individuals do not.

Considering some population, what are all the reasons why it can evolve? What could possibly change the allele frequencies?

Force 1: Mutation

An obvious way to change the allele frequency in a population is to introduce a brand new allele. DNA polymerase is not perfect, and has an error rate of roughly once every 10 million bases. These mistakes are not always caught. In a single-celled organism like a bacterium, any lasting mutation lands in a new cell. In a multicellular organism like a human, mutations must happen to germline cells that become gametes (i.e. sperm or egg), which in turn must undergo fertilization.

Before:

After:

Obviously, the allele frequency has changed from 100% D to a mix of 90% D and 10% d. Although the chance of a single locus mutating is very small, there are many possible loci for mutations to occur.

Mutation may have a small effect on allele frequency, but in many cases it can be a significant force on evolution. Retroviruses such as influenza and HIV have incredibly high mutation rates because their genetic material is RNA, not DNA (RNA has a much higher mutation rate than DNA). Mutation for retroviruses largely drives their rapid evolution. This is why we get a flu shot every year that tries to protect us against the current flu virus and against what scientists predict it will evolve into. This is also why HIV is so difficult to cure.

In terms of diversity, mutation practically always introduces new alleles and variation into a population, so it makes the population more diverse.

Force 2: Selection

If a particular allele causes an individual to have a better chance at reproducing, then that beneficial allele is more likely to be propagated into the generations that follow. Conversely, a deleterious allele that reduces an individual’s ability to reproduce will be less likely to be passed onto the next generation. This “fitness” of an individual can be its ability to survive in the environment, find a mate, take care of its offspring, or anything that pertains to how successful its DNA propagates.

For example, if having D is more beneficial than d (or equivalently, d is more deleterious than D), then in time, we may expect to see more D and less d.

Selection tends to reduce the diversity of a population. When comparing two alleles, one will almost always be better than the other (after all, what are the chances that they are both exactly the same in terms of fitness?). If selection is the only force at work, then given a long enough time, the better allele will become so common that the worse allele will disappear completely. In this case, the better allele is said to be “fixed” in the population (there are no other alleles at the locus). Once an allele is fixed, there is no variation at that locus, and that allele is likely to remain fixed (no one has any other alleles to pass on to future generations).

Force 3: Genetic Drift

The term “genetic drift” essentially means “evolution by random chance”. Over time, a population’s allele frequencies change simply due to random chance. Even in an isolated population with no mutation and no selection, the allele frequencies waver around randomly.

As an experiment, draw 3 black circles and 3 white circles on a piece of paper. Each one represents a bacterium, with two possible colors.

Let us simulate the production of the next generation without mutation or selection. Roll a die 6 times to determine who gets to reproduce. For example, if you roll a 5, then the fifth dot from the left gets to produce a clone for the next generation. A bacterium can reproduce multiple times.

When I did this experiment, I got: 5, 2, 3, 2, 6, 3. So my next generation looks like this:

Evolution has occurred through random chance alone. The population evolved from 50% white and 50% black to 67% white and 33% black. When a bacterium reproduced, it reproduced exactly with no mutations. Additionally, each bacterium had an equal chance to reproduce, so no selection occurred. You may repeat this experiment yourself, and you will find very often that the allele frequencies change.

Genetic drift happens in all populations, but it has a larger effect in smaller populations. In the above example, going from 3 white and 3 black bacterium to 4 white and 2 black bacterium changed our allele frequencies from 50-50 to 67-33. While all it took was 2 more white bacteria in a population of 6 total, it would take 34 more white bacteria to see the same change (50-50 to 67-33) in a population of 100 total. The larger the population, the more unlikely it is for the allele frequencies to deviate far from the original values. Taken to the extreme, a theoretically infinite population would have no genetic drift at all.

Genetic drift is often seen as an opposing force to selection. Selection tends to make populations more fit for the environment over time. As beneficial alleles increase and deleterious alleles decrease in frequency, the population as a whole becomes more fit. Drift, however, can cause beneficial alleles to disappear and deleterious alleles to become fixed, purely by random chance. In very small populations, drift is stronger a force than selection, and fitness may not increase in these populations.

When it comes to real-world populations, this becomes critical in mainly two situations:

  1. The bottleneck effect describes when a large population suddenly shrinks drastically. This can be due to a natural disaster, or human influences. Currently, human-caused habitat destruction is perhaps the biggest driver of the bottleneck effect.
  2. The founder effect describes when a very small portion of a larger population leaves and founds a new population. This new population often starts off with only a few individuals.

In both these situations, a population becomes extremely small, so genetic drift becomes a very strong force. This is significant because in these tiny populations, it is much more likely for beneficial alleles to disappear and deleterious alleles to increase in frequency. For example, the devastating neurodegenerative disease known as Huntington’s Chorea traces back to a single individual that founded a tiny village in Venezuela.

Genetic drift, like selection, tends to reduce the diversity of a population. Given a long enough time, genetic drift alone will either eliminate or fix an allele. For example, if the above simulation were carried out repeatedly from one generation to the next, then eventually the entire population would be one color.

Force 4: Gene Flow

The final way for allele frequencies to change in a population is by having new individuals enter the population from somewhere else (immigration) or by having individuals of the population leave (emigration). Especially for populations that are fairly close to each other geographically, they may occasionally swap individuals.

The amount of gene flow between two populations can cause a lot of problems for population geneticists when analyzing the populations. If two populations constantly exchange individuals, then they really are a single population, even if the two populations are in different geographical locations. On the other hand, what looks like a single population may really be two different populations with very little gene flow in between, even though they occupy the same geographical area.

Besides being a headache for population geneticists, gene flow has had some big consequences in our society. In the early 1990s, genes from a strain of genetically-modified rice found their way into supposedly “natural” rice fields, thereby contaminating them with new genes. This cost millions of dollars in damages, and resulted in suspended rice exports from the United States.

In terms of diversity, gene flow can increase or decrease the variation in a population. It can be said that gene flow redistributes variation between populations.

A summary of the four forces

For a population to evolve—to have allele frequencies change over time—there are only 4 possible ways for that to happen:

  1. Mutation: a brand new allele shows up
  2. Selection: some alleles are better at being propagated than others
  3. Genetic drift: some alleles are propagated more than others through random chance
  4. Gene flow: alleles leave the population or enter the population from elsewhere

With some thought, it should be clear that these are the only possible ways that the allele frequencies can change. Hence, these are the only 4 forces of evolution.

Mutation is the only force that brings in new variation to a species. All other forces of evolution act on existing genetic variation, and the source of all genetic variation is mutation. This becomes critical in today’s world, where human activity and anthropogenic climate change threaten many species where mutation cannot introduce new variation as fast as the rapidly changing environment destroys it.

Note that our discussion on evolution was for a particular locus with some alleles. In fact, each locus evolves somewhat separately, and we can track the allele frequencies for any locus independently in the population. A population evolves if any locus changes allele frequency over time.

Another note is that our discussion here applies to both haploid and diploid organisms. Although our examples may have been tailored to a particular ploidy, all of our observations here hold for any ploidy.

Finally, as a reminder, a locus may have more than two alleles in a population. While is this relatively rare, these four forces act the same regardless of how many alleles there are.

Hardy-Weinberg equilibrium

Let us now consider only diploid organisms, such as humans.

Allele frequency vs genotype frequency

Recall that an individual’s genotype for a locus is the set of alleles that the individual possesses at that locus. Consider the following population that we saw earlier:

There are 5 individuals: 2 DD, 2 Dd, and 1 dd. The genotype frequencies, then, are 40% DD, 40% Dd, and 20% dd.

If you knew the genotype frequencies, you could very easily compute the allele frequencies. You could always draw out the individual genotypes and then count the alleles.

Now if you knew the allele frequencies for a locus in a population, would you know the genotype frequencies? In general, the answer is no. For example, all three populations below have allele frequencies of 50% D and 50% d, but have different genotype frequencies:

So even if you knew the allele frequencies, you would have no idea what the genotype frequencies were (unless there were only one allele at this locus, or if there are two alleles and there is only one copy of one of them).

Hardy-Weinberg equilibrium and allele frequency

A population at Hardy-Weinberg equilibrium (HWE) is one where the genotype frequencies depend only on the allele frequencies. In a population at HWE, knowing the allele frequencies allows computing the genotype frequencies. How can that be?

A population at HWE is a population that isn’t evolving. That is, the allele frequencies are constant over time. For simplicity let’s say that the frequency of D is p, and the frequency of d is q.

Since there’s no evolution, future generations have alleles that are very easily predicted. An offspring receives one allele from each parent. Each allele has probability p of being D, and probability q of being d. This table shows how probable each genotype is in terms of how probable each allele is:

If the frequency of D is p, and the frequency of d is q, then in the next generation:

  • The frequency of DD is p2
  • The frequency of Dd is 2pq
  • The frequency of dd is q2

Thus, we can predict the genotype frequencies from the allele frequencies.

Make sure the distinguish genotype frequency from phenotype frequency. Assuming D is dominant over d, then DD shows the same phenotype as Dd. Then under HWE, the frequency of the dominant phenotype is p2 + 2pq, and the frequency of the recessive phenotype is q2.

Note that this analysis works just as well for loci with more than 2 alleles (in which case there would be more than 3 possible genotypes).

Conditions for Hardy-Weinberg equilibrium

We can only predict the genotype frequencies from the allele frequencies when a population is at HWE. This happens when there is no evolution. Equivalently, a population is at HWE when the four forces of evolution are absent:

  1. No mutation
  2. No selection
  3. Random mating in an infinite population (no genetic drift)
  4. No movement in or out of the population (no gene flow)

If the four forces of evolution are absent, then the population is at HWE.

Clearly, no real population satisfies all of these conditions (obviously, no real populations are infinite, for one). But that doesn’t mean that HWE never occurs. In fact, even though no population is at HWE, there are many loci in a population that behave as if they are at HWE.

It can be difficult to identify such loci, but in general, the loci that behave as if they are at HWE are under very weak selection in a very large population. There are also many more loci that are “close to” HWE, meaning their genotypes match the predictions fairly closely, even if they are not exactly what HWE would predict.


Tags: biology evolution teaching