What is the evolutionary meaning of the polymorphism?

How big may the group of bipeds have been that once founded the species Homo sapiens? Was it a few? Could we even all have the same primal mother? Or was the founding population much larger, perhaps more than 10,000 individuals?

The view that new species, including humans, emerged from small populations is encouraged by some recent research. But findings about the evolution of genes for certain immune molecules call them into question. It is about the unique polymorphic gene complex, which is the basis for recognizing foreign proteins of invading pathogens and which plays a role in the rejection of transplants (Fig. 1).


More than 50 years ago, the English pathologist Peter A. Gorer of the College of the University of London discovered that most of the body's cells present, so to speak, molecular calling cards on their surface, which distinguish one individual from another. These proteins are responsible for the rejection reactions in transplants - which was what promoted the discovery at the time - and were therefore called tissue compatibility or histocompatibility molecules.

Gorer also showed that from the multitude of these molecules a certain category has a particularly strong influence on the rejection reactions; their genes are grouped on a single chromosome and there in turn in a certain region, the main histocompatibility complex or MHC (after English major histocompatibility complex; for its structure see Figure 1).

Of course, MHC molecules were not created with the aim of thwarting tissue or organ transplants. Rather, they ensure that the immune system recognizes invading infectious pathogens and initiates a defense. While they are still being synthesized in the cell, they store short peptides (pieces of protein) which they then present on the cell surface. For the most part, these are pieces of broken down endogenous proteins, but sometimes also breakdown products of intruders.

The T cells, certain lymphocytes that are constantly on the move in the body and examine the cell surfaces, are on watch patrol. They do not react to presented endogenous peptides, but all the more violently to foreign ones when they bind to the complex of MHC molecule and peptide, with receptors that specifically recognize a certain combination. Activated in this way, the lymphocytes set in motion defense mechanisms that aim to destroy infected cells and the intruder himself. In the case of a transplant, the MHC molecules of the donor tissue are recognized as foreign and this is why it is attacked.

The human MHC or HLA complex (human leukocyte antigen) comprises more than 100 genes on chromosome 6 (Fig. 1) and extends over a section of about four million base pairs in length. (The so-called core bases of the nucleotides, the building blocks of the genetic material, code in their sequence that of the amino acids in the protein.) However, only some of these genes code for peptide-presenting proteins. Some carry the information for protein-breaking enzymes or for molecules that take care of the transport of peptides through membranes. Others are necessary for other key functions of the immune system, which are not explained here. Still others have nothing to do with the immune system, the purpose of some is still completely unknown, and some are even not functional (they are incomplete or cannot be read).

High polymorphism at one locus

The actual MHC genes can be divided into two classes according to their structure and function. This is supposed to be an example of a member of class II: DRB1, a functional gene that both humans and other primates have.

The fact that foreign organs are inevitably rejected (only not between identical twins) - without counteractive medical measures - already suggested that all human beings differ in their MHC molecules.

This was confirmed when the base sequences of functional MHC genes were deciphered: each of them can in fact occur in one of many different forms. This means that at a certain location (or locus) on the chromosome, every person carries one of many versions that are present in the population, so-called alleles. A clock with a date display can illustrate the connection: Its field would be the immovable locus, while the displayed date would be one of the alleles. If you consider that there are also several such functional MHC loci on the chromosome and its partner, then based on the known numbers theoretically at least one trillion allele combinations are possible - far more than humans have ever lived on earth.

Even if only a part of the possibilities in the population actually occurs, it is still more than enough that two randomly selected, unrelated people never match in all of their MHC alleles. This is a peculiarity, because the same applies to most of the other, so to speak normal genes by no means. They either exist only in a single allelic form (the clock always shows the same date), or there are only very few forms, one of which is usually very common, but the others are rare. If, however, many different alleles occur with significant frequency at a genetic locus in a population, one speaks of polymorphism.

The polymorphism of the MHC is also characterized by the fact that the individual alleles sometimes deviate considerably from one another in their base sequence, in 100 or more positions, while other allelic genes differ at most in a few.

Age of allele diversity

Now the question is whether this enormous polymorphism had to be rebuilt by each evolving biological species or whether it was taken over from the predecessor species. If the founding population consisted of only a few individuals, the first would inevitably apply. This can also be illustrated with a simple calculation example: 100 are drawn blindly from a pool of 20,000 marbles in 40 colors, each of the same amount. The probability that you have fished out all 40 colors is 0.02.

The polymorphism would really have to be rebuilt if later generations are descendants of a single pregnant female. The current polymorphism would then be younger than the species.

Indications of a relatively young age result from the coalescence theory of population genetics (coalescence means the union of lineages in retrospect): If one randomly picks out two neutral, i.e. not subject to selection, alleles from a population, their family history can be identified can theoretically be traced back to their common ancestor in a family tree. The number of generations that have passed since then is on average twice as large as the so-called effective population size, which roughly corresponds to the number of individuals reproducing in each case. For a very small idealized population of only five individuals, this is illustrated in Figure 2. (A maximum of twice as many different alleles as individuals can survive from one gene: two per individual because the chromosomes are present in pairs.)

Assuming the effective size of the former population was 10,000 individuals and the average generation time was 20 years, the common origin of two neutral alleles of a certain gene that are present today would have to be a statistical mean of 400,000 years ago. It is currently estimated that archaic man had its roots about half a million years ago. As a result, most of today's genetic polymorphism would not have emerged until after this point in time.

If this also applies to MHC genes, they would have to mutate much more quickly than genes otherwise do (it is assumed that genetic sequences mutate over sufficiently long periods at certain, more or less constant rates). This is the only way to explain, under the assumptions made, why their polymorphism exceeds that of other genes to such a large extent. In fact, 15 years ago such a high mutation rate was assumed in the MHC. Only one of us (Klein) pointed out that this could not be reconciled with the available data.

Together with other scientists at the Max Planck Institute for Biology in Tübingen, Bernhard Arden and Edward K. Wakeland, he discovered identical MHC alleles in two species of mice in the late 1970s, even though the evolutionary lines of the two species differed two million years ago had separated. This finding was all the more surprising as the MHC polymorphism in these animals is similar to that in humans. And he suggested that MHC genes evolve only as slowly as their less variable counterparts.

In 1980 Klein formulated the trans-species hypothesis of the MHC polymorphism. According to her, the pronounced diversity in the base sequence of the MHC alleles does not result from an increased mutation rate, but rather is passed on from species to species when a new species evolves. Therefore, the coalescence time, i.e. the time that has elapsed since the development of two MHC alleles from their original allele, can far exceed the lifespan of a species.

In 1988, direct evidence of the correctness of the hypothesis was found for the first time. The evolutionary lines of the house mouse and brown rat separated more than ten million years ago. Klein and his institute colleague Felipe Figueroa as well as Eberhard Günther from the University of Göttingen found MHC alleles in them whose common origin must be well before this time: Some alleles of the mouse are much more similar to certain alleles of the rat than to other alleles and vice versa. The lines of development of these alleles seem to have separated long before those of the species and are still represented in both species.

Species comparison

The presumed old age of the MHC polymorphism was soon confirmed in numerous other studies on rodents and primates. The analyzes were carried out by Wakeland, who now works at the University of Florida in Gainesville, Werner E. Mayer from the Max Planck Institute for Biology in Tübingen, Peter Parham from the University of Stanford (California), Henry A. Erlich, the at the time worked for the company Cetus in Emeryville (California), and Ronald E. Bontrop from ITRI-TNO in Rijswijk (Netherlands).

For example, an inter-species comparison of selected alleles of the mentioned DRB1 locus is impressive. As shown in the box on page 60, certain manifestations in humans are similar in chimpanzees. Strikingly, the two human alleles selected here differ more from one another than from their respective counterparts in monkeys.

This can be quantified by calculating the genetic distance, which is defined as the number of exchanged bases (base substitutions) per length of the DNA sequences under consideration. Using these values, a family tree can be constructed for the four DRB1 sequences: The length of the branches is proportional to the genetic distance; and alleles for which this is the lowest come to lie on neighboring branch ends. The branching diagram shows that the lines of today's two human alleles were already separated before the stem lines of humans and chimpanzees diverged - and that was at least four million years ago.

Many human MHC alleles or their lines appear to be considerably older. According to investigations by Klein, Figueroa and Colm O'Huigin in Tübingen, not a few began to diverge before the line of the semi-apes branched off from that of the other primates, and that was about 65 million years ago. In the time that has passed since then, all of the younger primate species have emerged, and each time the existing MHC polymorphism must have been passed on from the older to the younger.

Selection pressure on MHC genes

How can this old age of MHC alleles be reconciled with the coalescence theory that the origin of all human alleles was no further than about 400,000 years ago? Well, one of their main premises is what one of us (Takahata) pointed out in 1990, especially not at the MHC: that the genes are selection-neutral. Rather, they are subject to a balancing selection (with an advantage of the mixed inheritance), which means that different alleles can stay longer in a population than in a selection-neutral situation. By expanding the theory accordingly, Takahata was able to show that the common origin of two alleles must lie farther back, the stronger the selection acts on them. For MHC alleles, mean coalescence times of several million years were found.

Austin L. Hughes and Masatoshi Nei, who were working at the University of Texas in Houston at the time, provided indirect evidence that a balancing selection actually affects functional MHC alleles. Because of the redundancy of the genetic code, the amino acid sequence of the corresponding protein only changes with certain base substitutions in the DNA. (This is called a non-synonymous substitution; if the amino acids remain unchanged, it is synonymous.) The quotient of their frequencies, called gamma, should be one for neutral genes that are not subject to selection. Unlike synonymous substitutions, non-synonymous substitutions can bring an evolutionary advantage and thus be read out positively (gamma greater than one) or an evolutionary disadvantage and then read out negatively (gamma less than one). Gamma is therefore an indicator of aggressive selection pressure.

Hughes and Nei each determined separate gamma values ​​for functional MHC genes: on the one hand, those for the section that codes for the peptide-binding region of the MHC molecule, on the other hand, those for the rest. They only found positive values ​​for this region, for all other negative. In it, they conclude, amino acid substitutions offer an evolutionary advantage. Indeed, most of the differences between the alleles are found in the segment coding for the binding region.

In 1991, Adrian V.S.'s group provided direct evidence of positive selection of MHC genes. Hill at Oxford University: She showed that certain human MHC alleles provide better immune protection against the most dangerous malaria pathogen, Plasmodium falciparum, than others.

Obviously, the peptide-binding region of the MHC molecule is exposed to a constant evolutionary pressure to be present in as many different forms as possible in order to be armed against the multitude of pathogens. As a result, MHC alleles remain in the gene pool of a population for a long time, they persist - and are ultimately also given to a new species that emerges from it as a vital dowry.

size of

Founder Populations

As we have seen before, the theoretical considerations about the number and persistence of MHC alleles always include the effective population size. The central question is, for example, how large was the population from which humans emerged. Small founder populations, as they are often postulated for speciation, probably only play a role in the settlement of islands. You can certainly explain the origin of island species such as the Darwin's finches on the Galápagos (Spektrum der Wissenschaft, December 1991, p. 64). In general, of course, we do not have any direct information about the size of the founder populations. Fortunately, the MHC polymorphism offers a unique opportunity to infer the respective size indirectly: from the number of MHC alleles inherited from the predecessor species, the mutation rate and the selection intensity.

Let's imagine a population of a certain size with a certain number of MHC alleles. If we go back in their family tree, the lines of development (i.e. branches) of the many alleles would unite one after the other, the oldest finally in the alleles of common origin. Let us assume that we are at a branch base where two younger lines of development converge. After what time back in the past is a renewed union with further branches most likely? The value depends on the size of the population and the number of alleles at the time. Conversely, if we know the number of these alleles and the coalescence time, we can infer the effective population size.

How do you get such coalescence times? To do this, one compares the base sequences of alleles of a gene locus, calculates the genetic distances from the differences and uses this to construct a family tree that can be set in relation to the true chronological sequence.Yoko Satta from the Max Planck Institute in Tübingen has shown that synonymous substitutions in MHC alleles occur with great regularity if one looks at long periods of time - like looking at a clock; and she was then able to calibrate this watch on the basis of paleontological data. Thus it can now be calculated how far back the common origin of two alleles lies, i.e. when their lines separated. Furthermore, for each point in time you can see how many alleles a certain gene location had at that time.

We have now applied this to the DRB1 locus. We conclude from the allele number he has known so far in today's population that (in the long term) the effective population size must have been around 100,000 individuals over the last 500,000 years. Similar figures result for other MHC loci in humans. And the MHC polymorphism of monkey species, which has been studied more or less thoroughly, indicates strong populations during their evolution.

Evolutive bottlenecks

However, these calculations cannot rule out the possibility that the population collapsed dramatically from time to time, for example because of a famine or an epidemic. As a result, some alleles are lost for future generations. Such a genetic bottleneck is known in technical jargon as a bottleneck. The same phenomenon occurs when a small group of an established population migrates, successfully colonizes a new habitat, and establishes a new species. (As we shall show, besides the narrowness of the bottle neck, its length is also important.)

The behavior of the genes during the bottleneck passage cannot easily be determined mathematically precisely because the random fluctuations can be very large. That is why one remains dependent on computer simulation. In the initial stage, 100,000 individuals are present; for the locus under consideration there would be 40 equally frequent alleles, each occurring 5000 times in the pool of 200,000 genes. The computer program now randomly draws 500 individuals (1000 genes) from the pool - the survivors of a disaster, so to speak. Now the population is in the bottleneck. From their 1000 genes, 1000 genes are drawn again for the next generation, but now each removed one is replaced by an identical one before the next choice is made (an inherited gene is not lost to the parent pool). This procedure is repeated for ten generations, and at the end it is evaluated how many of the originally 40 different alleles have been preserved. The whole simulation is then repeated several times.

With these specifications, it was shown that all 40 alleles survive the bottleneck passage in 60 percent of the total runs. It looks different, however, if the passage is made much narrower, so that the population is reduced to a few individuals, or if the state is maintained over even more generations. Then there is practically no chance that the original polymorphism will be fully preserved. (An impression of this is given in Figure 3.)

We conclude from such simulations that founder populations must contain at least 500 reproducing individuals. In fact, it should be considerably more, because we only included alleles of a single locus and only those whose proteins differ in the peptide-binding region. This is in sharp contradiction to the view that speciation takes place in small founder populations, because there, due to the small number of individuals, the gene frequencies fluctuate more markedly due to chance and selection therefore has greater effects - which means that a faster evolution can take place. But the MHC polymorphism makes it impossible for humanity to be based on a tiny number of individuals or even on a single primordial mother.

The idea of ​​a kind of modern Eve sprang from a study published in 1987 by the group of Allan C. Wilson and Rebecca L. Cann of the University of California at Berkeley (Spectrum of Science, Jun 1992, p. 72). They compared base sequences of mitochondrial DNA from a number of today's human populations and designed a family tree. (The mitochondria lie in the cell outside the cell nucleus and are only passed on to the next generation from the mother with the egg cell.)

This study, although neither the first nor the last of its kind, caused quite a stir, especially the expression "mitochondrial Eve" coined by the authors. Many speakers and commentators, especially in the popular media, interpreted this as if today's people all came from a single woman - which the study by no means said. Rather, their authors believed they could prove (although no consensus has yet been reached among scientists) that all the variants of mitochondrial DNA occurring in humans today can be traced back to a single molecule from a woman who lived in Africa about 200,000 years ago.

Even if it had really been so, that does not mean that one primeval mother alone founded humanity. Only for the lineages of today's mitochondrial DNA variants would this mean that they come from a single original molecule that existed at the time. Because mitochondria are inherited as a whole, the DNA enclosed in them can be viewed as a unit, more or less like one of the 40,000 or more human genes. However, if one were to trace the other genes back to their respective origin, which is theoretically possible, it would be differently far back: for some MHC alleles, as we have seen, more than 65 million years.

The expression "mitochondrial Eve" is all too easy to imagine family trees of individuals; but only those for genes are meant. The results of the Berkeley group do not contradict the insights that the analysis of the MHC polymorphism has brought us. Nor do they in any way suggest that there was a genetic bottleneck in human evolution with an extremely small population size.

The human line split early into at least two separate ones - one of which led to modern day Homo sapiens. The population producing it must, as the MHC data suggest, have had at least 500, but probably more likely 10,000 reproducing individuals who already carried most of the present-day MHC alleles or MHC allele lines.

It cannot be ruled out that the total population was divided into smaller but contacting groups. The exchange of genes by partners from other groups protected the valuable MHC polymorphism from bleeding to death as a result of random changes in gene frequency.

New approaches

Research into the MHC polymorphism also promises information about recent movements of the earth's population, for example about the size of the groups that populated America, Australia, Polynesia or Japan. And it will undoubtedly allow conclusions to be drawn about the process of speciation itself.

This results in a new approach to clarifying the controversial question of whether evolutionary changes occur more abruptly in shorter bursts at the beginning of speciation or whether they accumulate more continuously as long as a species exists (see "Mechanisms of Evolution", by Francisco J. Ayala , Spectrum of Science, May 1979, page 8). The same applies to the question of whether new species emerge as offshoots of old ones, which in turn continue to exist, or whether one species slowly changes into another.

The same methods and considerations can even be applied to completely different genetic polymorphisms. Certain plant species, for example, are genetically protected against self-fertilization: the pollen cannot grow into a tube that penetrates to the ovary if it and the pollinated individual carry the same allele at a certain gene location for incompatibility. These self-incompatibility genes come in a number of allelic forms.

Thomas R. Ioerger, who now works at the University of Illinois, as well as Andrew G. Clark and Teh-Hui Kao at the State University of Pennsylvania in University Park, determined base sequences of such alleles from three nightshades: from an ornamental tobacco, a wild petunia and a wild Potato type. As they found out, some species-specific alleles also differ more from each other than from the corresponding allele of the other species. These allele lines seem to have separated before the three species diverged - around 27 to 36 million years ago. Here too, a relatively large start-up population can be assumed.

The possibility of such comparative studies on allele diversity shows that the same principles are evident in these plants as in the MHC of primates. They allow us to make statements about populations that existed millions of years ago. We are witnessing the emergence of a new branch of science: population paleogenetics.


- Evolution of the Major Histocompatibility Complex. By Jan Klein and Felipe Figueroa in: CRC Critical Reviews in Immunology, Volume 6, Issue 4, pages 295 to 386, 1986.

- Natural History of the Major Histocompatibility Complex. From Jan Klein. John Wiley & Sons, 1986.

- A Simple Genealogical Structure of Strongly Balanced Allelic Lines and Trans-Species Evolution of Polymorphism. By Naoyuki Takahata in: Proceedings of the National Academy of Sciences, Volume 87, Issue 7, pages 2419-2423, April 1990.

- The Major Histocompatibility Complex and Human Evolution. By Jan Klein, Jutta Gutknecht and Norbert Fischer in: Trends in Genetics, Volume 6, Issue 1, Pages 7 to 11, 1990.

- Trans-Specific MHC Polymorphism and the Origin of Species in Primates. By Jan Klein, Yoko Satta, Naoyuki Takahata and Colm O'Huigin in: Journal of Medical Primatology, Volume 22, Pages 57-64, 1993.

- The main histocompatibility complex and the distinction between self and foreign by the immune system. By Jan Klein, Hans-Georg R Bäumenee and Zoltan A. Nagy in: Die Naturwissenschaften, 70th year, issue 6, June 1983, pages 265 to 271.

- immunology. From Jan Klein. Verlag Chemie, Weinheim 1991.

- evolution. The development from the first traces of life to humans. Spectrum of Science Publishing Company, Heidelberg 1988.

- The immune system. Spectrum of Science Special 2, 1993.

From: Spectrum of Science 2/1994, page 56
© Spektrum der Wissenschaft Verlagsgesellschaft mbH

This article is contained in Spectrum of Science 2/1994