Is there a measure of human genetic variation, where human genetic variation can differ more than 1%?

Is there a measure of human genetic variation, where human genetic variation can differ more than 1%?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

According to this

Neardenthal genetics

The proportion of Neanderthal-inherited genetic material is about 1 to 4 percent[12] [later refined to 1.5 to 2.1 percent[11]] and is found in all non-African populations

And according to this

Interbreeding between archaic and modern humans

Neanderthal-derived ancestry is absent from most modern populations in sub-Saharan Africa

Now, human genetic variation is usually said to be between 0,1% to 0,5%


While the genetic difference between individual humans today is minuscule - about 0.1%, on average

Why are the genomes of humans 99,5% the same

Human's DNA sequence is said to be roughly 99.5% equal

But here,

Human genetic variation

There are at least 6 different ways to measure human genetic variation and

human-to-human genetic variation is estimated to be at least 0.5%

Does this last affirmation and the first two articles tell us that the difference in human genetic variation, according to some measures of variation, it's actually more than 1% ? (larger than some ways to measure our genetical differences with other animals like chimps for example that are usually mentioned to highlight our similatiries) Or are there things missinterpreted in all this?

The question makes it clear that the poster is aware of the different ways of estimating human genetic variation, and he is also no doubt aware of the fact that Wikipedia is written by “people like you” who may not be experts in the field or have updated an old entry based on limited data. Let me address what appears to be the cause of confusion in this question - certainly in other questions on this topic.

For simplicity let us take a suitable round number for the Human/Neanderthal relatedness. Then the poster's first quotation can be simplified as:

  1. The proportion of Neanderthal-inherited genetic material in modern Europeans is 2%.

and his second quotation simplified as:

  1. Neanderthal-inherited genetic material is absent from modern Africans.

A common conclusion drawn from this is:

The genetic variation between modern Europeans and Africans must therefore exceed 2%, rather than being only ca. 0.5%.

Why is this wrong?

This conclusion is wrong because the 2% and 0.5% are measures of different things.

For simplicity, a value of human genetic variation of 0.5% can be regarded as the extent by which the DNA of the coding regions of the genes of two humans differ because of single nucleotide polymorphisms (SNPs) - single base changes in the same gene (that may or may not affect the gene function). This assumes all humans have the same 20,000 or so genes.

Now let us make an overall comparison of the genes of humans and Neanderthals. Since they diverged from a common ancestor their genomes underwent changes due to point mutations, but, as far as I am aware, they maintained the same set of genes. Even if I am wrong on this latter point, the vast majority of genes - and the ones that we are concerned with are common: they both have/had a gene for myosin, for hexokinase 1, for chymotrypsin etc.

Examination of the majority of such homologous genes reveals single base differences, which must have arisen since the divergence. A combination of such Neanderthal variants in one gene - never seen as human SNPs - and the converse situation allow one to designate a chymotrypsin gene, for example, as human or Neanderthal, even though the overall DNA sequence of the chymotrypsin gene may be 95% identical (to conjure a figure out of the air).

Now consider the European/Neanderthal comparison. It turns out that a small number of genes show no (or almost no) variation between the two, but that most of these show the standard ca. 95% difference between Africans and Neanderthals (and hence between Africans and Europeans). These, then, are the 2% of genes that arose from interbreeding of Europeans and Neanderthals, after the exodus from Africa.

But 2% of genes of different origin which still retain 95% sequence identity would only add 0.1% to the modern human variation, cited at ca. 0.5%.


I do not know the actual extent of variation between most human and Neanderthal genes or the details of the non-African genes of Neanderthal origin. I will be happy to edit this answer if anyone has that detail. I apologize for the geographical generalizations not present in the question, but my concern was to convey the argument without burdening it with continued qualifications.


The concept of race Edit

The concept of "race" as a classification system of humans based on visible physical characteristics emerged over the last five centuries, influenced by European colonialism. [8] [9] The concept has manifested in different forms based on social conditions of a particular group, often used to justify unequal treatment. Early influential attempts to classify humans into discrete races include 4 races in Carl Linnaeus's Systema Naturae (Homo europaeus, asiaticus, americanus, and afer) [10] [11] and 5 races in Johann Friedrich Blumenbach's On the Natural Variety of Mankind. [12] Notably, over the next centuries, scholars argued for anywhere from 3 to more than 60 race categories. [13] Race concepts have changed within a society over time for example, in the United States social and legal designations of "White" have been inconsistently applied to Native Americans, Arab Americans, and Asian Americans, among other groups (See main article: Definitions of whiteness in the United States). Race categories also vary worldwide for example, the same person might be perceived as belonging to a different category in the United States versus Brazil. [14] Because of the arbitrariness inherent in the concept of race, it is difficult to relate it to biology in a straightforward way.

Race and human genetic variation Edit

There is broad consensus across the biological and social sciences that race is a social construct, not an accurate representation of human genetic variation. [15] [16] [17] Humans are remarkably genetically similar, sharing approximately 99.9% of their genetic code with one another. We nonetheless see wide individual variation in phenotype, which arises from both genetic differences and complex gene-environment interactions. The vast majority of this genetic variation occurs within groups very little genetic variation differentiates between groups. [18] Crucially, the between-group genetic differences that do exist do not map onto socially recognized categories of race. Furthermore, although human populations show some genetic clustering across geographic space, human genetic variation is "clinal", or continuous. [19] [20] This, in addition to the fact that different traits vary on different clines, makes it impossible to draw discrete genetic boundaries around human groups. Finally, insights from ancient DNA are revealing that no human population is "pure" – all populations represent a long history of migration and mixing. [21] For example, the genetic makeup of European populations was massively transformed by waves of migrations of farmers from the Near East between 8,500-5,000 years ago and Yamnaya pastoralists from the Eurasian steppe beginning around 4,500 years ago. [22] [23]

Genetic variation arises from mutations, from natural selection, migration between populations (gene flow) and from the reshuffling of genes through sexual reproduction. [24] Mutations lead to a change in the DNA structure, as the order of the bases are rearranged. Resultantly, different polypeptide proteins are coded. Some mutations may be positive and can help the individual survive more effectively in their environment. Variation is counteracted by natural selection and by genetic drift note too the founder effect, when a small number of initial founders establish a population which hence starts with a correspondingly small degree of genetic variation. [25] Epigenetic inheritance involves heritable changes in phenotype (appearance) or gene expression caused by mechanisms other than changes in the DNA sequence. [26]

Human phenotypes are highly polygenic (dependent on interaction by many genes) and are influenced by environment as well as by genetics.

Nucleotide diversity is based on single mutations, single nucleotide polymorphisms (SNPs). The nucleotide diversity between humans is about 0.1 percent (one difference per one thousand nucleotides between two humans chosen at random). This amounts to approximately three million SNPs (since the human genome has about three billion nucleotides). There are an estimated ten million SNPs in the human population. [27]

Research has shown that non-SNP (structural) variation accounts for more human genetic variation than single nucleotide diversity. Structural variation includes copy-number variation and results from deletions, inversions, insertions and duplications. It is estimated that approximately 0.4 to 0.6 percent of the genomes of unrelated people differ. [28] [29]

Much scientific research has been organized around the question of whether or not there is genetic basis for race. According to Luigi Luca Cavalli-Sforza, "From a scientific point of view, the concept of race has failed to obtain any consensus none is likely, given the gradual variation in existence. It may be objected that the racial stereotypes have a consistency that allows even the layman to classify individuals. However, the major stereotypes, all based on skin color, hair color and form, and facial traits, reflect superficial differences that are not confirmed by deeper analysis with more reliable genetic traits and whose origin dates from recent evolution mostly under the effect of climate and perhaps sexual selection". [30] [31] [32] [33] [34] [35]

Scientists investigating human variation have used a series of methods to characterize how different populations vary.

Early studies of traits, proteins, and genes Edit

Early racial classification attempts measured surface traits, particularly skin color, hair color and texture, eye color, and head size and shape. (Measurements of the latter through craniometry were repeatedly discredited in the late 19th and mid-20th centuries due to a lack of correlation of phenotypic traits with racial categorization. [36] ) In actuality, biological adaptation plays the biggest role in these bodily features and skin type. A relative handful of genes accounts for the inherited factors shaping a person's appearance. [37] [38] Humans have an estimated 19,000–20,000 human protein-coding genes. [39] Richard Sturm and David Duffy describe 11 genes that affect skin pigmentation and explain most variations in human skin color, the most significant of which are MC1R, ASIP, OCA2, and TYR. [40] There is evidence that as many as 16 different genes could be responsible for eye color in humans however, the main two genes associated with eye color variation are OCA2 and HERC2, and both are localized in chromosome 15. [41]

Analysis of blood proteins and between-group genetics Edit

Before the discovery of DNA, scientists used blood proteins (the human blood group systems) to study human genetic variation. Research by Ludwik and Hanka Herschfeld during World War I found that the incidence of blood groups A and B differed by region for example, among Europeans 15 percent were group B and 40 percent group A. Eastern Europeans and Russians had a higher incidence of group B people from India had the greatest incidence. The Herschfelds concluded that humans comprised two "biochemical races", originating separately. It was hypothesized that these two races later mixed, resulting in the patterns of groups A and B. This was one of the first theories of racial differences to include the idea that human variation did not correlate with genetic variation. It was expected that groups with similar proportions of blood groups would be more closely related, but instead it was often found that groups separated by great distances (such as those from Madagascar and Russia), had similar incidences. [42] It was later discovered that the ABO blood group system is not just common to humans, but shared with other primates, [43] and likely predates all human groups. [44]

In 1972, Richard Lewontin performed a FST statistical analysis using 17 markers (including blood-group proteins). He found that the majority of genetic differences between humans (85.4 percent) were found within a population, 8.3 percent were found between populations within a race and 6.3 percent were found to differentiate races (Caucasian, African, Mongoloid, South Asian Aborigines, Amerinds, Oceanians, and Australian Aborigines in his study). Since then, other analyses have found FST values of 6–10 percent between continental human groups, 5–15 percent between different populations on the same continent and 75–85 percent within populations. [45] [46] [47] [48] [49] This view has been affirmed by the American Anthropological Association and the American Association of Physical Anthropologists since. [50]

Critiques of blood protein analysis Edit

While acknowledging Lewontin's observation that humans are genetically homogeneous, A. W. F. Edwards in his 2003 paper "Human Genetic Diversity: Lewontin's Fallacy" argued that information distinguishing populations from each other is hidden in the correlation structure of allele frequencies, making it possible to classify individuals using mathematical techniques. Edwards argued that even if the probability of misclassifying an individual based on a single genetic marker is as high as 30 percent (as Lewontin reported in 1972), the misclassification probability nears zero if enough genetic markers are studied simultaneously. Edwards saw Lewontin's argument as based on a political stance, denying biological differences to argue for social equality. [51] Edwards' paper is reprinted, commented upon by experts such as Noah Rosenberg, and given further context in an interview with philosopher of science Rasmus Grønfeldt Winther in a recent anthology. [52]

As referred to before, Edwards criticises Lewontin's paper as he took 17 different traits and analysed them independently, without looking at them in conjunction with any other protein. Thus, it would have been fairly convenient for Lewontin to come up with the conclusion that racial naturalism is not tenable, according to his argument. [53] Sesardic also strengthened Edwards' view, as he used an illustration referring to squares and triangles, and showed that if you look at one trait in isolation, then it will most likely be a bad predicator of which group the individual belongs to. [54] In contrast, in a 2014 paper, reprinted in the 2018 Edwards Cambridge University Press volume, Rasmus Grønfeldt Winther argues that "Lewontin's Fallacy" is effectively a misnomer, as there really are two different sets of methods and questions at play in studying the genomic population structure of our species: "variance partitioning" and "clustering analysis." According to Winther, they are "two sides of the same mathematics coin" and neither "necessarily implies anything about the reality of human groups." [55]

Current studies of population genetics Edit

Researchers currently use genetic testing, which may involve hundreds (or thousands) of genetic markers or the entire genome.

Structure Edit

Several methods to examine and quantify genetic subgroups exist, including cluster and principal components analysis. Genetic markers from individuals are examined to find a population's genetic structure. While subgroups overlap when examining variants of one marker only, when a number of markers are examined different subgroups have different average genetic structure. An individual may be described as belonging to several subgroups. These subgroups may be more or less distinct, depending on how much overlap there is with other subgroups. [56]

In cluster analysis, the number of clusters to search for K is determined in advance how distinct the clusters are varies.

The results obtained from cluster analyses depend on several factors:

  • A large number genetic markers studied facilitates finding distinct clusters. [57]
  • Some genetic markers vary more than others, so fewer are required to find distinct clusters. [58]Ancestry-informative markers exhibits substantially different frequencies between populations from different geographical regions. Using AIMs, scientists can determine a person's ancestral continent of origin based solely on their DNA. AIMs can also be used to determine someone's admixture proportions. [59]
  • The more individuals studied, the easier it becomes to detect distinct clusters (statistical noise is reduced). [58]
  • Low genetic variation makes it more difficult to find distinct clusters. [58] Greater geographic distance generally increases genetic variation, making identifying clusters easier. [60]
  • A similar cluster structure is seen with different genetic markers when the number of genetic markers included is sufficiently large. The clustering structure obtained with different statistical techniques is similar. A similar cluster structure is found in the original sample with a subsample of the original sample. [61]

Recent studies have been published using an increasing number of genetic markers. [58] [61] [62] [63] [64] [65]

Distance Edit

Genetic distance is genetic divergence between species or populations of a species. It may compare the genetic similarity of related species, such as humans and chimpanzees. Within a species, genetic distance measures divergence between subgroups. Genetic distance significantly correlates to geographic distance between populations, a phenomenon sometimes known as "isolation by distance". [66] Genetic distance may be the result of physical boundaries restricting gene flow such as islands, deserts, mountains or forests. Genetic distance is measured by the fixation index (FST). FST is the correlation of randomly chosen alleles in a subgroup to a larger population. It is often expressed as a proportion of genetic diversity. This comparison of genetic variability within (and between) populations is used in population genetics. The values range from 0 to 1 zero indicates the two populations are freely interbreeding, and one would indicate that two populations are separate.

Many studies place the average FST distance between human races at about 0.125. Henry Harpending argued that this value implies on a world scale a "kinship between two individuals of the same human population is equivalent to kinship between grandparent and grandchild or between half siblings". In fact, the formulas derived in Harpending's paper in the "Kinship in a subdivided population" section imply that two unrelated individuals of the same race have a higher coefficient of kinship (0.125) than an individual and their mixed race half-sibling (0.109). [67]

Critiques of FST Edit

While acknowledging that FST remains useful, a number of scientists have written about other approaches to characterizing human genetic variation. [68] [69] [70] Long & Kittles (2009) stated that FST failed to identify important variation and that when the analysis includes only humans, FST = 0.119, but adding chimpanzees increases it only to FST = 0.183. [68] Mountain & Risch (2004) argued that an FST estimate of 0.10–0.15 does not rule out a genetic basis for phenotypic differences between groups and that a low FST estimate implies little about the degree to which genes contribute to between-group differences. [69] Pearse & Crandall 2004 wrote that FST figures cannot distinguish between a situation of high migration between populations with a long divergence time, and one of a relatively recent shared history but no ongoing gene flow. [70] In their 2015 article, Keith Hunley, Graciela Cabana, and Jeffrey Long (who had previously criticized Lewontin's statistical methodology with Rick Kittles [50] ) recalculate the apportionment of human diversity using a more complex model than Lewontin and his successors. They conclude: "In sum, we concur with Lewontin's conclusion that Western-based racial classifications have no taxonomic significance, and we hope that this research, which takes into account our current understanding of the structure of human diversity, places his seminal finding on firmer evolutionary footing." [71]

Anthropologists (such as C. Loring Brace), [72] philosopher Jonathan Kaplan and geneticist Joseph Graves [73] have argued that while it is possible to find biological and genetic variation roughly corresponding to race, this is true for almost all geographically distinct populations: the cluster structure of genetic data is dependent on the initial hypotheses of the researcher and the populations sampled. When one samples continental groups, the clusters become continental with other sampling patterns, the clusters would be different. Weiss and Fullerton note that if one sampled only Icelanders, Mayans and Maoris, three distinct clusters would form all other populations would be composed of genetic admixtures of Maori, Icelandic and Mayan material. [74] Kaplan therefore concludes that, while differences in particular allele frequencies can be used to identify populations that loosely correspond to the racial categories common in Western social discourse, the differences are of no more biological significance than the differences found between any human populations (e.g., the Spanish and Portuguese). [75]

Historical and geographical analyses Edit

Current-population genetic structure does not imply that differing clusters or components indicate only one ancestral home per group for example, a genetic cluster in the US comprises Hispanics with European, Native American and African ancestry. [57]

Geographic analyses attempt to identify places of origin, their relative importance and possible causes of genetic variation in an area. The results can be presented as maps showing genetic variation. Cavalli-Sforza and colleagues argue that if genetic variations are investigated, they often correspond to population migrations due to new sources of food, improved transportation or shifts in political power. For example, in Europe the most significant direction of genetic variation corresponds to the spread of agriculture from the Middle East to Europe between 10,000 and 6,000 years ago. [76] Such geographic analysis works best in the absence of recent large-scale, rapid migrations.

Historic analyses use differences in genetic variation (measured by genetic distance) as a molecular clock indicating the evolutionary relation of species or groups, and can be used to create evolutionary trees reconstructing population separations. [76]

Results of genetic-ancestry research are supported if they agree with research results from other fields, such as linguistics or archeology. [76] Cavalli-Sforza and colleagues have argued that there is a correspondence between language families found in linguistic research and the population tree they found in their 1994 study. There are generally shorter genetic distances between populations using languages from the same language family. Exceptions to this rule are also found, for example Sami, who are genetically associated with populations speaking languages from other language families. The Sami speak a Uralic language, but are genetically primarily European. This is argued to have resulted from migration (and interbreeding) with Europeans while retaining their original language. Agreement also exists between research dates in archeology and those calculated using genetic distance. [58] [76]

Self-identification studies Edit

Jorde and Wooding found that while clusters from genetic markers were correlated with some traditional concepts of race, the correlations were imperfect and imprecise due to the continuous and overlapping nature of genetic variation, noting that ancestry, which can be accurately determined, is not equivalent to the concept of race. [77]

A 2005 study by Tang and colleagues used 326 genetic markers to determine genetic clusters. The 3,636 subjects, from the United States and Taiwan, self-identified as belonging to white, African American, East Asian or Hispanic ethnic groups. The study found "nearly perfect correspondence between genetic cluster and SIRE for major ethnic groups living in the United States, with a discrepancy rate of only 0.14 percent". [57] Paschou et al. found "essentially perfect" agreement between 51 self-identified populations of origin and the population's genetic structure, using 650,000 genetic markers. Selecting for informative genetic markers allowed a reduction to less than 650, while retaining near-total accuracy. [78]

Correspondence between genetic clusters in a population (such as the current US population) and self-identified race or ethnic groups does not mean that such a cluster (or group) corresponds to only one ethnic group. African Americans have an estimated 20–25-percent European genetic admixture Hispanics have European, Native American and African ancestry. [57] In Brazil there has been extensive admixture between Europeans, Amerindians and Africans. As a result, skin color differences within the population are not gradual, and there are relatively weak associations between self-reported race and African ancestry. [79] [80] Ethnoracial self- classification in Brazilians is certainly not random with respect to genome individual ancestry, but the strength of the association between the phenotype and median proportion of African ancestry varies largely across population. [81]

Critique of genetic-distance studies and clusters Edit

Genetic distances generally increase continually with geographic distance, which makes a dividing line arbitrary. Any two neighboring settlements will exhibit some genetic difference from each other, which could be defined as a race. Therefore, attempts to classify races impose an artificial discontinuity on a naturally occurring phenomenon. This explains why studies on population genetic structure yield varying results, depending on methodology. [82]

Rosenberg and colleagues (2005) have argued, based on cluster analysis of the 52 populations in the Human Genetic Diversity Panel, that populations do not always vary continuously and a population's genetic structure is consistent if enough genetic markers (and subjects) are included.

Examination of the relationship between genetic and geographic distance supports a view in which the clusters arise not as an artifact of the sampling scheme, but from small discontinuous jumps in genetic distance for most population pairs on opposite sides of geographic barriers, in comparison with genetic distance for pairs on the same side. Thus, analysis of the 993-locus dataset corroborates our earlier results: if enough markers are used with a sufficiently large worldwide sample, individuals can be partitioned into genetic clusters that match major geographic subdivisions of the globe, with some individuals from intermediate geographic locations having mixed membership in the clusters that correspond to neighboring regions.

They also wrote, regarding a model with five clusters corresponding to Africa, Eurasia (Europe, Middle East, and Central/South Asia), East Asia, Oceania, and the Americas:

For population pairs from the same cluster, as geographic distance increases, genetic distance increases in a linear manner, consistent with a clinal population structure. However, for pairs from different clusters, genetic distance is generally larger than that between intracluster pairs that have the same geographic distance. For example, genetic distances for population pairs with one population in Eurasia and the other in East Asia are greater than those for pairs at equivalent geographic distance within Eurasia or within East Asia. Loosely speaking, it is these small discontinuous jumps in genetic distance—across oceans, the Himalayas, and the Sahara—that provide the basis for the ability of STRUCTURE to identify clusters that correspond to geographic regions. [61]

This applies to populations in their ancestral homes when migrations and gene flow were slow large, rapid migrations exhibit different characteristics. Tang and colleagues (2004) wrote, "we detected only modest genetic differentiation between different current geographic locales within each race/ethnicity group. Thus, ancient geographic ancestry, which is highly correlated with self-identified race/ethnicity—as opposed to current residence—is the major determinant of genetic structure in the U.S. population". [57]

Cluster analysis has been criticized because the number of clusters to search for is decided in advance, with different values possible (although with varying degrees of probability). [83] Principal component analysis does not decide in advance how many components for which to search. [84]

The 2002 study by Rosenberg et al. [85] exemplifies why meanings of these clusterings are disputable. The study shows that at the K=5 cluster analysis, genetic clusterings roughly map onto each of the five major geographical regions. Similar results were gathered in further studies in 2005. [86]

However, in addition to the five main supposedly geographical clusters, a sixth group, the Kalash, a minority ethnic group in Pakistan, began to appear starting at K=6. The racial naturalist Nicholas Wade considers that the results "make no genetic or geographic sense". They are therefore omitted in his book A Troublesome Inheritance in favour of the K=5 cluster analysis.

This bias, however, is reflective of how the research is inherently flawed. The sample population is chosen with geographical representation and folk concepts of race in mind, instead of accounting for the genetic diversity within the different geographical regions. The Kalash did not fit into the general pattern for it had been a genetically isolated population that happened to be reflected in this study. Potentially numerous genetically drifted groups, such as the uncontacted Sentinelese, are not represented in the study. [ citation needed ]

Critique of ancestry-informative markers Edit

Ancestry-informative markers (AIMs) are a genealogy tracing technology that has come under much criticism due to its reliance on reference populations. In a 2015 article, Troy Duster outlines how contemporary technology allows the tracing of ancestral lineage but along only the lines of one maternal and one paternal line. That is, of 64 total great-great-great-great-grandparents, only one from each parent is identified, implying the other 62 ancestors are ignored in tracing efforts. [87] Furthermore, the 'reference populations' used as markers for membership of a particular group are designated arbitrarily and contemporarily. In other words, using populations who currently reside in given places as references for certain races and ethnic groups is unreliable due to the demographic changes which have occurred over many centuries in those places. Furthermore, ancestry-informative markers being widely shared among the whole human population, it is their frequency which is tested, not their mere absence/presence. A threshold of relative frequency has, therefore, to be set. According to Duster the criteria for setting such thresholds are a trade secret of the companies marketing the tests. Thus, we cannot say anything conclusive on whether they are appropriate. Results of AIMs are extremely sensitive to where this bar is set. [88] Given that many genetic traits are found very similar amid many different populations, the rate of frequency which is taken to be enough for being part of a reference population is very important. This can also lead to mistakes, given that many populations may share the same patterns, if not exactly the same genes. "This means that someone from Bulgaria whose ancestors go back to the fifteenth century could (and sometime does) map as partly 'Native American ' ". [87] This happens because AIMs rely on a '100% purity' assumption of reference populations. That is, they assume that a pattern of traits would ideally be necessary and sufficient condition for assigning an individual to an ancestral reference populations.

There are certain statistical differences between racial groups in susceptibility to certain diseases. [89] Genes change in response to local diseases for example, people who are Duffy-negative tend to have a higher resistance to malaria. The Duffy negative phenotype is highly frequent in central Africa and the frequency decreases with distance away from Central Africa, with higher frequencies in global populations with high degrees of recent African immigration. This suggests that the Duffy negative genotype evolved in Sub-Saharan Africa and was subsequently positively selected for in the Malaria endemic zone. [90] A number of genetic conditions prevalent in malaria-endemic areas may provide genetic resistance to malaria, including sickle cell disease, thalassaemias and glucose-6-phosphate dehydrogenase. Cystic fibrosis is the most common life-limiting autosomal recessive disease among people of European ancestry a hypothesized heterozygote advantage, providing resistance to diseases earlier common in Europe, has been challenged. [91] Scientists Michael Yudell, Dorothy Roberts, Rob DeSalle, and Sarah Tishkoff argue that using these associations in the practice of medicine has led doctors to overlook or misidentify disease: "For example, hemoglobinopathies can be misdiagnosed because of the identification of sickle-cell as a 'Black' disease and thalassemia as a 'Mediterranean' disease. Cystic fibrosis is underdiagnosed in populations of African ancestry, because it is thought of as a 'White' disease." [92]

Information about a person's population of origin may aid in diagnosis, and adverse drug responses may vary by group. [58] [ dubious – discuss ] Because of the correlation between self-identified race and genetic clusters, medical treatments influenced by genetics have varying rates of success between self-defined racial groups. [93] For this reason, some physicians [ who? ] consider a patient's race in choosing the most effective treatment, [94] and some drugs are marketed with race-specific instructions. [95] Jorde and Wooding (2004) have argued that because of genetic variation within racial groups, when "it finally becomes feasible and available, individual genetic assessment of relevant genes will probably prove more useful than race in medical decision making". However, race continues to be a factor when examining groups (such as epidemiologic research). [77] Some doctors and scientists such as geneticist Neil Risch argue that using self-identified race as a proxy for ancestry is necessary to be able to get a sufficiently broad sample of different ancestral populations, and in turn to be able to provide health care that is tailored to the needs of minority groups. [96]

Usage in scientific journals Edit

Some scientific journals have addressed previous methodological errors by requiring more rigorous scrutiny of population variables. Since 2000, Nature Genetics requires its authors to "explain why they make use of particular ethnic groups or populations, and how classification was achieved". Editors of Nature Genetics say that "[they] hope that this will raise awareness and inspire more rigorous designs of genetic and epidemiological studies". [97]

Gene-environment interactions Edit

Lorusso and Bacchini [98] argue that self-identified race is of greater use in medicine as it correlates strongly with risk-related exposomes that are potentially heritable when they become embodied in the epigenome. They summarise evidence of the link between racial discrimination and health outcomes due to poorer food quality, access to healthcare, housing conditions, education, access to information, exposure to infectious agents and toxic substances, and material scarcity. They also cite evidence that this process can work positively – for example, the psychological advantage of perceiving oneself at the top of a social hierarchy is linked to improved health. However they caution that the effects of discrimination do not offer a complete explanation for differential rates of disease and risk factors between racial groups, and the employment of self-identified race has the potential to reinforce racial inequalities.

Racial naturalism is the view that racial classifications are grounded in objective patterns of genetic similarities and differences. Proponents of this view have justified it using the scientific evidence described above. However, this view is controversial and philosophers [99] of race have put forward four main objections to it.

Semantic objections, such as the discreteness objection, argue that the human populations picked out in population-genetic research are not races and do not correspond to what "race" means in the United States. "The discreteness objection does not require there to be no genetic admixture in the human species in order for there to be US 'racial groups' . rather . what the objection claims is that membership in US racial groups is different from membership in continental populations. . Thus, strictly speaking, Blacks are not identical to Africans, Whites are not identical to Eurasians, Asians are not identical to East Asians and so forth." [100] Therefore, it could be argued that scientific research is not really about race.

The next two objections, are metaphysical objections which argue that even if the semantic objections fail, human genetic clustering results do not support the biological reality of race. The 'very important objection' stipulates that races in the US definition fail to be important to biology, in the sense that continental populations do not form biological subspecies. The 'objectively real objection' states that "US racial groups are not biologically real because they are not objectively real in the sense of existing independently of human interest, belief, or some other mental state of humans." [101] Racial naturalists, such as Quayshawn Spencer, have responded to each of these objections with counter-arguments. There are also methodological critics who reject racial naturalism because of concerns relating to the experimental design, execution, or interpretation of the relevant population-genetic research. [102]

Another semantic objection is the visibility objection which refutes the claim that there are US racial groups in human population structures. Philosophers such as Joshua Glasgow and Naomi Zack believe that US racial groups cannot be defined by visible traits, such as skin colour and physical attributes: "The ancestral genetic tracking material has no effect on phenotypes, or biological traits of organisms, which would include the traits deemed racial, because the ancestral tracking genetic material plays no role in the production of proteins it is not the kind of material that 'codes' for protein production." [103] [ page needed ] Spencer contends that certain racial discourses require visible groups, but disagrees that this is a requirement in all US racial discourse. [ citation needed ] [ undue weight? – discuss ]

A different objection states that US racial groups are not biologically real because they are not objectively real in the sense of existing independently of some mental state of humans. Proponents of this second metaphysical objection include Naomi Zack and Ron Sundstrom. [103] [104] Spencer argues that an entity can be both biologically real and socially constructed. Spencer states that in order to accurately capture real biological entities, social factors must also be considered. [ citation needed ] [ undue weight? – discuss ]

It has been argued that knowledge of a person's race is limited in value, since people of the same race vary from one another. [77] David J. Witherspoon and colleagues have argued that when individuals are assigned to population groups, two randomly chosen individuals from different populations can resemble each other more than a randomly chosen member of their own group. They found that many thousands of genetic markers had to be used for the answer to "How often is a pair of individuals from one population genetically more dissimilar than two individuals chosen from two different populations?" to be "never". This assumed three population groups, separated by large geographic distances (European, African and East Asian). The global human population is more complex, and studying a large number of groups would require an increased number of markers for the same answer. They conclude that "caution should be used when using geographic or genetic ancestry to make inferences about individual phenotypes", [105] and "The fact that, given enough genetic data, individuals can be correctly assigned to their populations of origin is compatible with the observation that most human genetic variation is found within populations, not between them. It is also compatible with our finding that, even when the most distinct populations are considered and hundreds of loci are used, individuals are frequently more similar to members of other populations than to members of their own population". [106]

This is similar to the conclusion reached by anthropologist Norman Sauer in a 1992 article on the ability of forensic anthropologists to assign "race" to a skeleton, based on craniofacial features and limb morphology. Sauer said, "the successful assignment of race to a skeletal specimen is not a vindication of the race concept, but rather a prediction that an individual, while alive was assigned to a particular socially constructed 'racial' category. A specimen may display features that point to African ancestry. In this country that person is likely to have been labeled Black regardless of whether or not such a race actually exists in nature". [107]

Criticism of race-based medicines Edit

Troy Duster points out that genetics is often not the predominant determinant of disease susceptibilities, even though they might correlate with specific socially defined categories. This is because this research oftentimes lacks control for a multiplicity of socio-economic factors. He cites data collected by King and Rewers that indicates how dietary differences play a significant role in explaining variations of diabetes prevalence between populations.

Duster elaborates by putting forward the example of the Pima of Arizona, a population suffering from disproportionately high rates of diabetes. The reason for such, he argues, was not necessarily a result of the prevalence of the FABP2 gene, which is associated with insulin resistance. Rather he argues that scientists often discount the lifestyle implications under specific socio-historical contexts. For instance, near the end of the 19th century, the Pima economy was predominantly agriculture-based. However, as the European American population settles into traditionally Pima territory, the Pima lifestyles became heavily Westernised. Within three decades, the incidence of diabetes increased multiple folds. Governmental provision of free relatively high-fat food to alleviate the prevalence of poverty in the population is noted as an explanation of this phenomenon. [108]

Lorusso and Bacchini argue against the assumption that "self-identified race is a good proxy for a specific genetic ancestry" [98] on the basis that self-identified race is complex: it depends on a range of psychological, cultural and social factors, and is therefore "not a robust proxy for genetic ancestry". [109] Furthermore, they explain that an individual's self-identified race is made up of further, collectively arbitrary factors: personal opinions about what race is and the extent to which it should be taken into consideration in everyday life. Furthermore, individuals who share a genetic ancestry may differ in their racial self-identification across historical or socioeconomic contexts. From this, Lorusso and Bacchini conclude that the accuracy in the prediction of genetic ancestry on the basis of self-identification is low, specifically in racially admixed populations born out of complex ancestral histories.

Activity 1: Genetic Variation in Populations

The growing ability to detect and measure human genetic variation allows us to study similarities and differences among individuals. In this activity, you will analyze data on genetic variation and address a series of questions about variation within and between populations.

You should understand the following concepts before you begin this activity:

  • the relationship between genes and proteins
  • the relationship between gene and phenotype
  • the difference between a gene and an allele and
  • statements of frequency, for example, 0.45 (45 percent).

Variation in Populations

How do we ordinarily identify a person as a member of a particular racial or ethnic group?

To what extent do we focus on external characteristics such as skin color, hair texture, or characteristic facial structures?

Are there drawbacks to relying on external physical characteristics?

Are there any other biological indicators that could provide insights into group similarities and differences?

Would examining the genetic similarities and differences provide reliable guidance to the classification of groups?

Look at allele frequencies for three different genes in populations around the world. You will see several maps that contain a subset of the actual data collected by scientists.

What is the range of frequencies for each allele shown?

Which allele varies most in frequency?

Which allele varies least?

Propose some hypotheses that explain the variation in the frequency of FY-0.

Hint 1: Think of hypotheses based on evolution and natural selection.

Hint 2: Natural selection is a function of environmental variations acting on naturally occurring genetic variations and their phenotypic effects.

See the following map to examine additional data to help you refine your hypothesis.

What is the relationship between the incidence of Plasmodium vivax malaria and the FY-O allele?

Review the information below about what scientists know about each of the alleles used in this activity. If you wish, rework your hypothesis.

The GC gene (which has two major alleles, 1 and 2) codes for a blood protein that attaches to vitamin D and regulates its distribution within the body.

The HP gene codes for another blood protein (haptoglobin). This protein attaches to the hemoglobin released by the red blood cells when they decay at the end of their natural life or when they are destroyed by a disease, such as malaria.

The FY gene codes for a protein that is normally found on the surface of red blood cells. The protein makes it easier for a particular malarial parasite, Plasmodium vivax, to get into the red blood cell. Once in the red blood cell, P. vivax, like all malarial parasites, multiplies. The FY-O allele results in the absence of the protein. That makes it hard for the parasite to gain entry to the red blood cells and multiply. Therefore, the FY-O allele provides a certain amount of protection against this type of malaria.

The FY-0 allele provides a selective advantage in regions where Plasmodium vivax malaria is common. That advantage accounts for the increased frequency of the FY-0 allele in those regions. We know of only a few such examples where there is a clear relationship between an environmental variable and differences in allele frequency.

Scientists have examined DNA from chromosomal regions that have a lot of detectable differences in sequence.


STRP: autosomal short-tandem-repeat polymorphisms. These are variable segments of DNA that are 3, 4 or 5 bases long, repeated over and over.

RSP: autosomal restriction-site polymorphism. DNA sequence variations that occur at a restriction enzyme recognition sequences that result in variations (polymorphisms) in the length of DNA fragments obtained by cutting DNA with a particular restriction enzyme.

Alu: Alu-insertion polymorphism. Alu sequences are repeat sequences that are about 300 bases long. There are many thousands of Alu repeats in the genome, and they appear within genes and between genes. They have no known function.

HVS1 and HVS2: hypervariable sequence 1 and 2. This is DNA from mitochondria taken from regions that have a lot of differences in DNA sequence.

Y-STRP: Y chromosome short-tandem-repeat polymorphisms.

Now examine the data above from a study of worldwide genetic variation in 255 individuals. The individuals included 72 Africans, 63 Asians and 120 Europeans.

Based on the data you have seen, could you draw boundaries that would separate populations clearly on the basis of genetic differences?

The study of genetic variations in Homo sapiens shows that there is more genetic variation within populations than between populations. This means that two random individuals from any one group are almost as different as any two random individuals from the entire world.

Although it may be easy to observe distinct external differences between groups of people, it is more difficult to distinguish such groups genetically, since most genetic variation is found within all groups.


1. Allele: an alternate form of a gene at a given locus. For example, the gene for ABO blood group has three alleles, A, B, and O.

2. Allele frequency: the commonness of a particular allele in a given population, stated as a number, from 0 to 1, or as a percentage, from 0 to 100.

3. What is Fst?

To understand Lewontin’s 1972 study, you need a basic grasp of some entry-level genetics stuff. Don’t worry, it isn’t too complicated.

Lewontin’s study was the first in history to measure the Fst (aka “Fixation Index”) of humanity as a whole. In layman’s terms, Fst is a way to measure how much genetic material is shared by population groups. In science terms, “Fst” is the proportion of the total genetic variation of a subpopulation (s) relative to the total genetic variation of the total population (t). An Fst rating of zero means total sharing of genetic material (genetically identical subpopulations, e.g. clones). An Fst rating of one means that there is no sharing whatsoever.

Lower Fst = more shared genes and less genetic variation.
Higher Fst = less shared genes and more genetic variation.

A basic example using two subpopulations (the minimum number):


DNA samples: The 10 Africans used were 1 Biaka Pygmy, 1 Mbuti Pygmy, 1 Ghanaian, 1 Kikuyu, 1!Kung, 1 Luo, 2 Nigerians (Yuroba and Rivers), 1 South African Bantu speaker, and 1 Zulu (also a South African Bantu speaker) the 10 Europeans were 1 Finnish, 1 French, 1 German, 1 Hungarian, 1 Italian, 1 Portuguese, 1 Russian, 1 Spanish, 1 Swedish, and 1 Ukranian and the 10 Asians were 1 Cambodian, 2 Chinese (North and South), 1 Han Taiwanese, 2 Indians (Punjab and Bengal), 1 Japanese, 1 Mongolian, 1 Vietnamese, and 1 Yakut. As every segment studied is autosomal, the number of sequences studied for each segment is 60 (20 for each continent studied).

Selection of DNA segments: Fifty noncoding, nonrepetitive genomic segments (each ∼1 kb), which covered almost all autosomes, were selected randomly with reference to the Gesee C hen and L i (2001) for details. All of them were chosen to avoid coding or close linkage to any coding regions. In each segment and its nearby regions there was no registered gene in GenBank and no potential coding region was detected by either GenScan or GRAIL-EXP.

PCR amplification and DNA sequencing: Touchdown PCR (D on et al. 1991) was used and the reactions were carried out following the conditions described (Z hao et al. 2000). The PCR products were purified by the Wizard PCR Preps DNA purification resin kit (Promega, Madison, WI). Sequencing reaction was performed according to the protocol of ABI Prism BigDye Terminator sequencing kits (Perkin-Elmer, Norwalk, CT) modified by quarter reaction. The extension products were purified by Sephadex G-50 (DNA grade Pharmacia, Piscataway, NJ) and run on an ABI 377XL DNA sequencer using 4.25% gels (Sooner Scientific). About 500 bp of each segment was sequenced.

ABI DNA Sequence Analysis 3.0 was used for lane tracking and base calling. The data were then proofread manually and heterozygous sites were detected as double peaks. The forward and reverse sequences were assembled automatically in each individual using SeqMan in DNASTAR. The assembled files were carefully checked by eye. Fluorescent traces for each variant site were rechecked again in all individuals. All singletons, which were variants that appear only once in the total sample, were verified by PCR reamplification and resequencing the PCR products in both directions.

Data analysis: The sequences were aligned by SeqMan in the DNASTAR or the DAMBE package (X ia 2000). Nucleotide diversity values were calculated using DNASP (R ozas and R ozas 1999), DAMBE, and our own programs.


Celera Genomics Completes the First Assembly of the Human Genome (press release). []

Angler N: Do races differ? Not really, genes show. New York Times. 2000, F1-F5.

Bowcock AM, Hebert JM, Mountain JL, Kidd JR, Rogers J, Kidd KK, Cavalli-Sforza LL: Study of an additional 48 DNA markers in five human populations from four continents. Gene Geogr. 1991, 5: 151-173.

Goldstein DB, Linares AR, Cavalli-Sforza LL, Feldman MW: Genetic absolute dating based on microsatellites and the origin of modern humans. Proc Natl Acad Sci. 1995, 92: 6723-6727.

Weiss KM: In search of human variation. Genome Res. 1998, 8: 691-697.

Kruglyak L: The use of a genetic map of biallelic markers in linkage studies. Nature Genet. 1997, 17: 21-24.

dbSNP: A database of human single nucleotide polymorphisms. []

Smigielski EM, Sirotkin K, Minghong W, Sherry ST: dbSNP: a database of single nucleotide polymorphisms. Nucl Acids Res. 2000, 28: 352-355. 10.1093/nar/28.1.352.

Synder LH: Human blood groups: their inheritance and racial significance. Am J Phys Anthrop. 1926, 9: 233-263.

Livingstone FB: Anthropological implications of sickle-cell gene distribution in West Africa. Amer Anthrop. 1958, 60: 533-562.

Lewontin RC: The apportionment of human diversity. In Evolutionary Biology, vol 6. Edited by Dobzhansky TH, Hecht MK, Steere WC. New York: Appleton-Century-Crofts. 1972

Barbujani G, Magagni A, Minch E, Cavalli-Sforza LL: An apportionment of human DNA diversity. Proc Natl Acad Sci. 1997, 94: 4516-4519. 10.1073/pnas.94.9.4516.

Cann RL, Stoneking M, Wilson AC: Mitochondrial DNA and human evolution. Nature. 1987, 325: 31-36. 10.1038/325031a0.

Disotell TR: Sex-specific contributions to genome variation. Curr Biol. 1999, 9: R29-R31. 10.1016/S0960-9822(99)80039-6.

Hammer MF: A recent common ancestry for human Y chromosomes. Nature. 1995, 378: 376-378. 10.1038/378376a0.

Jorde LB, Watkins WS, Bamshad MJ, Dixon ME, Ricker CE, Seielstad MT, Batzer MA: The distribution of human genetic diversity: a comparison of mitochondrial, autosomal, and Y-chromosome data. Am J Hum Genet. 2000, 66: 979-988. 10.1086/302825.

Disotell TR: The southern route to Asia. Curr Biol. 1999, 9: R925-R928. 10.1016/S0960-9822(00)80106-2.

Stringer CB, Andrews P: Genetic and fossil evidence for the origin of modern humans. Science. 1988, 239: 233-263.

Cavalli-Sforza LL, Wilson AC, Cantor CR, Cook-Deegan RM, King MC: Call for a worldwide survey of human genetic diversity: a vanishing opportunity for the Human Genome Project. Genomics. 1991, 11: 490-491.

Marks J: The trouble with the Human Genome Diversity Project. Mol Med Today. 1998, 4: 243-10.1016/S1357-4310(98)01279-9.

Wallace RW: The Human Genome Diversity Project: medical benefits versus ethical concerns. Mol Med Today. 1998, 4: 59-62. 10.1016/S1357-4310(97)01206-9.

Goldstein JR: Kinship networks that cross racial lines: the exception or the rule?. Demography. 1999, 36: 399-407.

Parra EJ, Marcini A, Akey J, Martinson J, Batzer MA, Cooper R, Forrester T, Allison DB, Deka R, Ferrell RE, Shriver MD: Estimating African American admixture proportions by use of population-specific alleles. Am J Hum Genet. 1998, 63: 1839-1851. 10.1086/302148.

Long JC, Williams RC, McAuley JE, Medis R, Partel R, Tregellas WM, South SF, Rea AE, McCormick SB, Iwaniec U: Genetic variation in Arizona Mexican Americans: estimation and interpretation of admixture proportions. Am J Phys Anthropol. 1991, 84: 141-157.

Modern Human Diversity - Genetics

People today look remarkably diverse on the outside. But how much of this diversity is genetically encoded? How deep are these differences between human groups? First, compared with many other mammalian species, humans are genetically far less diverse – a counterintuitive finding, given our large population and worldwide distribution. For example, the subspecies of the chimpanzee that lives just in central Africa, Pan troglodytes troglodytes, has higher levels of diversity than do humans globally, and the genetic differentiation between the western (P. t. verus) and central (P. t. troglodytes) subspecies of chimpanzees is much greater than that between human populations.

Early studies of human diversity showed that most genetic diversity was found between individuals rather than between populations or continents and that variation in human diversity is best described by geographic gradients, or clines. A wide-ranging study published in 2004 found that 87.6% percent of the total modern human genetic diversity isaccounted for by the differences between individuals, and only 9.2% between continents. In general, 5%–15% of genetic variation occurs between large groups living on different continents, with the remaining majority of the variation occurring within such groups (Lewontin 1972 Jorde et al. 2000a Hinds et al. 2005). These results show that when individuals are sampled from around the globe, the pattern seen is not a matter of discrete clusters – but rather gradients in genetic variation (gradual geographic variations in allele frequencies) that extend over the entire world. Therefore,there is no reason to assume that major genetic discontinuities exist between peoples on different continents or "races." The authors of the 2004 study say that they ‘see no reason to assume that "races" represent any units of relevance for understanding human genetic history. An exception may be genes where different selection regimes have acted in different geographical regions. However, even in those cases, the genetic discontinuities seen are generally not "racial" or continental in nature but depend on historical and cultural factors that are more local in nature’ (Serre and Pääbo 2004: 1683-1684).

'Rare' genetic variants are surprisingly common, life scientists report

A large survey of human genetic variation, just published in the online version of the journal Science, shows that rare genetic variants are not so rare after all and offers insights into human diseases.

"I knew there would be rare variation but had no idea there would be so much of it," said the senior author of the research, John Novembre, an assistant professor of ecology and evolutionary biology and of bioinformatics at UCLA.

A team of life scientists studied 202 genes in 14,002 people. The human genome contains some 3 billion base pairs the scientists studied 864,000 of these pairs. While this is only a small part of the genome, the sample size of 14,002 people is one of the largest ever in a sequencing study in humans.

"Our results suggest there are many, many places in the genome where one individual, or a few individuals, have something different," Novembre said. "Overall, it is surprisingly common that there is a rare variant in the population.

"This study doesn't tell us how to cure a particular disease but suggests that disease in general may be caused by rare variants, and if you're trying to find the genetic basis of disease, it's important to focus on those variants. Understanding the genetic basis of disease provides clues to how the diseases work and clues about how to treat them."

The scientists discovered one genetic variant every 17 bases, which was a dramatically higher rate than they expected, said Novembre, a population geneticist who is a member of UCLA's interdepartmental program in bioinformatics.

Most of the time, only one person has the genetic variant and the other 14,001 do not.

"We saw lots of that," he said. "We discovered there are many places in these 202 genes where there is variation and only a few individuals differ from the whole group, or only one differs. We also see evidence that a substantial fraction of these rare genetic variants appear to be deleterious in a long-term evolutionary sense and might impact disease."

The research team included Daniel Wegmann, a former UCLA postdoctoral scholar in Novembre's laboratory and a co-first author of the study Darren Kessner, a UCLA graduate student in the bioinformatics interdepartmental Ph.D. program colleagues from the University of Michigan, Ann Arbor (in fields including human genetics and biostatistics) and geneticists from international health care company GlaxoSmithKline, including project leader Matthew Nelson. The UCLA life scientists were involved in the population genetic analysis of the data.

In the study, 10,621 people had one of 12 diseases, including coronary artery disease, multiple sclerosis, bipolar disorder, schizophrenia, osteoarthritis and Alzheimer's disease 3,381 did not have any of the diseases.

"The large sample size allows us to see patterns with more clarity than ever before," Novembre said. "If rare variants are like distant stars, this kind of large sample size is like having the Hubble Telescope it's allowing us to see more than before. We see a ton of rare variation, and these rare variants more often make changes to proteins than not. In that way, this study has important implications for the genetic basis of disease in humans. It's consistent with the idea that many diseases may be partly caused by rare variants."

Human population growth helps to explain the large number of genetic variants, the scientists said.

"The fact that we see so many rare variants is in part due to the fact that human populations have been growing very rapidly," Novembre said. "Because the human population has grown so much, the opportunity for mutations to occur has also grown. Some of the variants we are seeing are very young, dating to population growth since the invention of agriculture and even the Industrial Revolution this growth has created many opportunities for mutation in the genome because there are so many transmissions of chromosomes from parent to child in large populations."

The scientists isolated and sequenced the pieces of DNA from the 202 genes.

They estimated mutation rates from population genetic data, which has only rarely been done before.

"We have been able to estimate mutation rates for each of the genes, which has been difficult to do with smaller sample sizes," Novembre said. "In future research, we can study mutation rates not just in these 202 genes, but genome-wide."

Sequencing technologies are advancing rapidly, he said. "What seemed like science fiction in the past is science today."

Rare genetic variants would not have been detectable in most previous studies, whose samples usually had fewer than 1,000 people.

Typically, in population genetics, it is difficult to estimate mutation rates separately from population sizes, but when you get to very large sample sizes, you can estimate the two separately, Novembre said.

"We estimate 202 mutation rates, one for each gene," he said. "We show that the mutation rate varies from gene to gene. Follow-up studies may be able to reveal more about what factors affect mutation rates."

Rare genetic variants are frequently geographically localized to small pockets around the globe rather than being widespread, Novembre said.

In the image accompanying this release, each vertical line represents one of the 202 genes. For each gene, the scientists plotted, at the top of the image, the number of genetic variants that have a frequency greater than 0.5 percent. When variants are greater than 0.5 percent, previous studies have been able to find most of them.

"With our large sample size, we can detect variants at a frequency less than 0.5 percent, and we see all of these, which have never been seen before," Novembre said. "Previous studies have examined the tip of the iceberg of genetic variation, but there is all this rare variation that has been below the surface, below our threshold of detection. Now, with large sample sizes, we can see a more complete picture of human genetic diversity."

The genetic code has changes that are "nonsynonymous" (they change the meaning of a protein) and "synonymous" (they don't change the meaning of a protein).

"We see many nonsynonymous changes amongst the rare variants, and these are plausibly affecting disease in humans, though in ways that are not yet well understood," Novembre said

Lewontin in context

Population and human genetics had already begun to cast doubt on the validity of prior racial classification schemes well before the 1970’s. In fact, Edwards’ 1964 co-author, the eminent biological anthropologist Luca Cavalli-Sforza, wrote a section in History and Geography of Human Genes (1994) entitled: “Scientific Failure of the Concept of Human Races” (section 1.6). It stated that individual genetic variation accumulated over long periods of time, that most polymorphisms antedate the separation of humans into continents, that the same polymorphisms are found in all populations within the species, and that the difference between groups is small when compared with that within major groups. It also stated that no single gene was sufficient for classifying humans into systematic categories. Yet, using the very same cluster analysis Edwards claims is required to discriminate humans into racial groups, they rejected the notion of their existence.

In addition Edwards and Leroi unfairly single out Richard Lewontin as the architect of the statistical fallacy at the center of “the dominance of social construction.” In the same year that Lewontin published his paper, Masatoshi Nei and Arun Roychoudhury published similar results. 10 the general measurement of natural genetic variation, specifically to test classical theories of natural selection. For this reason, a series of papers were published all returning the same results and with similar analysis. It therefore seems problematic to focus a discussion of the fallacy of dismissing racial classification solely on Lewontin’s work. Nei and Roychoudhury were not considered radical scientists, thus it seems the singling out of Lewontin is more associated with a criticism of his political ideas, as opposed to any statistical fallacy he might have committed.

Finally, and most importantly, the legitimacy of social construction theory is entirely unrelated to the mathematical issues that Leroi raises in his discussion. Saying that legitimate ways to structure human populations exist does not say that the history of racial thinking applied to humans in biology and anthropology applied those same methods. Most galling was his claim of the ubiquity of the “social awareness” displayed by scientists studying human variation. Scientists differ in motive and ability just like people in any profession. Historically, scientists studying human racial variation have varied in political motivation from outright fascists to socialist revolutionaries. No scientist at any part of the spectrum was immune from socialized racial ideology. In the main, these scientists were from the socially dominant populations in their respective countries. The role that racism has played in American systems of social dominance need not be documented here. However, to think that social agendas and individual’s socialized racial ideologies have not and are not presently influencing research on human racial variation is at best hopelessly naïve, at worst dishonest. Ample past examples of this exist, such as the activities of the scientists associated with the Eugenics Record Office in the 1920’s and 30’s. To understand how social dominance is impacting modern research requires more sophistication and will be discussed below.

Genetics has proven that you’re unique—just like everyone else

It’s often said that humans are 99.9% identical. and what makes us unique is a measly 0.1% of our genome. This may seem insignificant. But what these declarations fail to point out is that the human genome is made up of three billion base pairs—which means 0.1% is still equal to three million base pairs.

In those three million differences lie the changes that give you red hair instead of blonde, or green eyes instead of blue. You can find changes that increase your risk of obesity, or others that decrease your risk of heart disease differences that make you taller or lactose intolerant, or allow you to run faster.

When I first started learning about genetic variation, I assumed these changes—the 0.1% that make us unique—only appeared in certain places, such as genes for height or inherited diseases like diabetes. I thought the rest of the genome—the other 99.9%—was fixed that the 0.1% that was different in me was more or less the same 0.1% that was different in you. But, as it turns out, the 0.1% of DNA that is different between people is not always the same 0.1%: Variation can happen anywhere in our genomes.

In fact, one group of scientists looking at 10,000 people found variants at 146 million unique positions, or about 4.8% of the genome. Another group collected the DNA from 15,000 people and found 254 million variants, roughly 8% of the genome. And as we continue to sequence 100,000, 100 million, or all seven billion people on the planet, we will find a lot more variation. This means that humans have many more differences than we first thought.

Imagine that your DNA is a car. There are certain obvious variants you can have: blue or white, two-door or four-door, convertible or sedan. These changes represent the 0.1%. Because the other 99.9%—the engine, the seats, the steering wheel, the tires—has to be there for the car to work, we assume they are fixed.

But electric cars have shown us that we don’t need the gas cap, the gas tank, or even a gas engine any more we can replace those things with a variant like batteries and charging ports. And maybe one day we’ll develop cars that have boosters instead of tires so we can hover over the ground.

In other words, what we believe is static may actually be variable. More than 0.1% of the car can change and it still be a car, just like the human genome.

With the rise of services that offer to sequence your DNA, more and more people are talking about the value of personal genomics and what you might uncover about yourself. These kinds of mail-in tests are an easy way to point to something tangible—like your blue eyes or the waddle you and your grandmother share—and say “It runs in the family.” You might even say, “There’s a gene for that!”

But those examples of straight-forward, visible evidence are just starting points in the immense and only partially explored field of personal genomics. There are also many variations of our genomes that are invisible to the naked eye, like the way we metabolize caffeine, have a distaste for cilantro, or the more serious examples of predispositions toward certain types of cancers and diseases like Alzheimer’s and Parkinson’s.

There are also all sorts of other gene variants we haven’t discovered yet. Because our data is limited by the amount of sequenced DNA available for study, scientists like myself have only explored a small portion of the genetic variation that exists in the world.

As access to personal genomics becomes a more practical option and more people opt in to research, this data pool grows every day. This means our theories will become much less theoretical in the months and years to come, and it soon won’t be surprising to discover there’s a gene for almost every trait.

So what does all this variation actually mean? What do we learn by cataloging all this information?

The consequences of sequencing millions of people’s DNA and identifying new genetic variants are both simultaneously predictable and unknown. On the predictable side, we are going to learn a lot more about human health and disease: Individual genetic variants and groups of genetic variants will be found to play a role in obesity, heart disease, and cancer, among other factors. We are going to find genetic variants responsible for rare diseases that have gone undiagnosed.

But it’s the unknown findings that get me excited. We don’t know how many unique variants we will find. And while our current understanding of biology suggests some positions in DNA are not variable (because any change in these genes disrupts the basic function of being human), we may discover that these positions actually are variable and can change. We’re also getting to a point where we will be able to better study the role of environment—what you are exposed to, the things you choose to eat, the activities you decided to engage in—and how it interacts with your DNA. With this information, we will be able to better make predictions about you as an individual.

There is still so much for us to discover about human genetic variation. A variant that increases risk for a disease today might turn out to be protective for another disease tomorrow. The more people who get their DNA sequenced—whether for personal or research purposes—the more we will discover.

We each carry three billion base pairs of information inside us with the potential to unravel a piece of the mystery that makes us all so fundamentally human. At the end of the day, we are all still more similar than we are different—but we are just beginning to understand how important our differences are.


  1. Trenten

    I think you are not right. I offer to discuss it. Write to me in PM, we'll talk.

  2. Burford

    An incomparable topic, I really like it))))

  3. Walter

    In my opinion, they are wrong. I am able to prove it. Write to me in PM, it talks to you.

  4. Jennalyn

    In it something is also idea good, agree with you.

  5. Mazuzragore


  6. Vasile

    It seems to me that this has already been discussed.

Write a message