MBE Advance Access originally published online on August 4, 2006
Molecular Biology and Evolution 2006 23(11):1995-1996; doi:10.1093/molbev/msl078
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Letter |
The Hidden Value of Missing Genotypes
Department of Zoology, Downing Street, Cambridge, UK
E-mail: wa100{at}hermes.cam.ac.uk.
| Abstract |
|---|
|
|
|---|
Robotic systems allow vast genetic data sets to be generated automatically with little manual input, raising questions about accuracy. To test whether errors occur randomly, I used the frequencies of missing genotypes in a large human data set to construct a population tree. Remarkably, the gaps appear to carry as strong a phylogenetic signal as the actual data themselves.
Key Words: human population microsatellite null alleles genetic distance genotyping error
Microsatellite markers are widely used but prone to many misleading artifacts. For example, phenotypic homozygotes may result from allele dropouts, nonamplifying (null) alleles, or loss of heterozygosity in cell culture (Callen et al. 1993
). Similarly, failed samples may arise either stochastically or as null allele homozygotes. To test whether these possible errors occur randomly, I examined the published data set of Rosenberg et al. (2005)
, comprising 1,048 samples from the human diversity cell line panel genotyped for 783 microsatellite markers (http://rosenberglab.bioinformatics.med.umich.edu/).
Overall, an average of 14 loci per individual (1.8%) are missing. If these represent null allele homozygotes, they should occur at similar frequencies in related populations. Consequently, I calculated pairwise correlations (Pearson), r, between populations for missing data frequency across loci. Then 0.7 r was used as a genetic distance measure to construct a Neighbor-Joining tree (fig. 1, for full matrix see supplementary table 1, Supplementary Material online) that appears remarkably consistent with prior expectations. All but 7 populations group in the clusters identified by Rosenberg et al. (2002
, 2005
). Of these, the Palestinians cluster with Central/Southern Asia, the Yakut with the Americans, and the Japanese and Cambodians with Oceania. Only the Colombians and Han appear far removed from where they should be.
|
For comparison, I generated 100 random subsets of 28 loci, the same average number of alleles available as missing data, and constructed a consensus Neighbor-Joining tree based on Nei's standard genetic distance (Takezaki and Nei 1996
|
Thus, missing genoptypes carry a similar or greater strength phylogenetic signal compared with all alleles scored at a similar number of loci. The likely explanation is that, in this data set, rigorous genotyping standards and high-quality DNA have eliminated most gaps due to experimental failure, leaving mainly gaps that are null allele homozygotes. Null alleles are thought to arise mainly through mutations in polymerase chain reaction primer sites (Callen et al. 1993
| Supplementary Material |
|---|
|
|
|---|
Supplementary tables 1 and 2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Footnotes |
|---|
Dan Graur, Associate Editor
| References |
|---|
|
|
|---|
Angers B and Bernatchez L. (1997) Complex evolution of a salmonid microsatellite locus and its consequences in inferring allelic divergence from size information. Mol Biol Evol 14:2308.[Abstract]
Callen DF, Thompson AD, Shen Y, Phillips HA, Richards RI, Mulley JC, Sutherland GR. (1993) Incidence and origins of "null" alleles in (AC)n microsatellite markers. Am J Hum Genet 52:9227.[Web of Science][Medline]
Felsenstein J. (1991) PHYLIP (phylogeny inference package)(University of Washington, Seattle, WA).
Garza JC and Freimer NB. (1996) Homoplasy for size at microsatellite loci in humans and chimpanzees. Genome Res 6:2117.
Koorey DJ, Bishop GA, McCaughan W. (1993) Allele non-amplification: a source of confusion in linkage studies employing microsatellite polymorphisms. Hum Mol Genet 2:28991.
Rosenberg NA, Maghajan S, Ramachandran S, Zhao C, Pritchard JK, Feldman MW. (2005) Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genet 1:66071.
Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivitovsky LA, Feldman MW. (2002) Genetic structure of human populations. Science 298:23815.
Takezaki N and Nei M. (1996) Genetic distances and reconstruction of phylogenetic trees from microsatellite data. Genetics 144:38999.[Abstract]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. A. Coyer, K. A. Miller, J. M. Engle, J. Veldsink, A. Cabello-Pasini, W. T. Stam, and J. L. Olsen Eelgrass Meadows in the California Channel Islands and Adjacent Coast Reveal a Mosaic of Two Species, Evidence for Introgression and Variable Clonality Ann. Bot., January 1, 2008; 101(1): 73 - 87. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


