Skip Navigation


MBE Advance Access originally published online on August 4, 2006
Molecular Biology and Evolution 2006 23(11):1995-1996; doi:10.1093/molbev/msl078
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
23/11/1995    most recent
msl078v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Amos, W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Amos, W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Letter

The Hidden Value of Missing Genotypes

William Amos

Department of Zoology, Downing Street, Cambridge, UK

E-mail: wa100{at}hermes.cam.ac.uk.


    Abstract
 TOP
 Abstract
 Supplementary Material
 References
 
Robotic systems allow vast genetic data sets to be generated automatically with little manual input, raising questions about accuracy. To test whether errors occur randomly, I used the frequencies of missing genotypes in a large human data set to construct a population tree. Remarkably, the gaps appear to carry as strong a phylogenetic signal as the actual data themselves.

Key Words: human population • microsatellite • null alleles • genetic distance • genotyping error

Microsatellite markers are widely used but prone to many misleading artifacts. For example, phenotypic homozygotes may result from allele dropouts, nonamplifying (null) alleles, or loss of heterozygosity in cell culture (Callen et al. 1993Go). Similarly, failed samples may arise either stochastically or as null allele homozygotes. To test whether these possible errors occur randomly, I examined the published data set of Rosenberg et al. (2005)Go, comprising 1,048 samples from the human diversity cell line panel genotyped for 783 microsatellite markers (http://rosenberglab.bioinformatics.med.umich.edu/).

Overall, an average of 14 loci per individual (1.8%) are missing. If these represent null allele homozygotes, they should occur at similar frequencies in related populations. Consequently, I calculated pairwise correlations (Pearson), r, between populations for missing data frequency across loci. Then 0.7 – r was used as a genetic distance measure to construct a Neighbor-Joining tree (fig. 1, for full matrix see supplementary table 1, Supplementary Material online) that appears remarkably consistent with prior expectations. All but 7 populations group in the clusters identified by Rosenberg et al. (2002Go, 2005Go). Of these, the Palestinians cluster with Central/Southern Asia, the Yakut with the Americans, and the Japanese and Cambodians with Oceania. Only the Colombians and Han appear far removed from where they should be.


Figure 1
View larger version (62K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Neighbor-Joining tree of 53 human populations based on the correlation in frequencies of missing data across 783 microsatellite loci. Major population groups are defined as in Rosenberg et al. (2002)Go, with major clusters identified by gray shading.

 
For comparison, I generated 100 random subsets of 28 loci, the same average number of alleles available as missing data, and constructed a consensus Neighbor-Joining tree based on Nei's standard genetic distance (Takezaki and Nei 1996Go), calculated using the alleles scored and implemented within the package PHYLIP (Felsenstein 1991Go). The resulting consensus tree appears no better and arguably worse than the one generated from missing data (fig. 2). Only the African and American clades are recovered with high confidence (bootstrap > 95%), whereas the European/Central South Asian and Middle Eastern populations fail to yield any bootstrap values greater than 30%.


Figure 2
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Consensus Neighbor-Joining tree constructed from 100 bootstrap replicates of 28 loci selected at random from the full set of 783. All clades supported by bootstrap values >50% are labeled.

 
Thus, missing genoptypes carry a similar or greater strength phylogenetic signal compared with all alleles scored at a similar number of loci. The likely explanation is that, in this data set, rigorous genotyping standards and high-quality DNA have eliminated most gaps due to experimental failure, leaving mainly gaps that are null allele homozygotes. Null alleles are thought to arise mainly through mutations in polymerase chain reaction primer sites (Callen et al. 1993Go; Koorey et al. 1993Go), implying that back mutations will be rare. Consequently, null alleles may be individually more informative than "normal" microsatellite alleles because the latter are prone to considerable homoplasy (Garza and Freimer 1996Go; Angers and Bernatchez 1997Go). Although this finding is itself rather amusing, it also carries important implications. If, as seems likely, most missing data are null homozygotes, this would imply a much larger number of phenotypic homozygotes that are in reality null/normal heterozygotes. Indeed, assuming all gaps are homozygote nulls and that the data are in Hardy–Weinberg equilibrium, an average of 14.3% of genotypes would be null-visible heterozygotes, and these would comprise a remarkable 50.5% of all phenotypic homozygotes (supplementary table 2, Supplementary Material online). This would prove problematic for studies such as linkage analysis and estimates of inbreeding, where the accurate scoring of homozygotes is critical and is likely to distort estimates of allele frequencies, allele sharing, and Fst.


    Supplementary Material
 TOP
 Abstract
 Supplementary Material
 References
 
Supplementary tables 1 and 2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Footnotes
 
Dan Graur, Associate Editor


    References
 TOP
 Abstract
 Supplementary Material
 References
 

    Angers B and Bernatchez L. (1997) Complex evolution of a salmonid microsatellite locus and its consequences in inferring allelic divergence from size information. Mol Biol Evol 14:230–8.[Abstract]

    Callen DF, Thompson AD, Shen Y, Phillips HA, Richards RI, Mulley JC, Sutherland GR. (1993) Incidence and origins of "null" alleles in (AC)n microsatellite markers. Am J Hum Genet 52:922–7.[Web of Science][Medline]

    Felsenstein J. (1991) PHYLIP (phylogeny inference package)(University of Washington, Seattle, WA).

    Garza JC and Freimer NB. (1996) Homoplasy for size at microsatellite loci in humans and chimpanzees. Genome Res 6:211–7.[Abstract/Free Full Text]

    Koorey DJ, Bishop GA, McCaughan W. (1993) Allele non-amplification: a source of confusion in linkage studies employing microsatellite polymorphisms. Hum Mol Genet 2:289–91.[Abstract/Free Full Text]

    Rosenberg NA, Maghajan S, Ramachandran S, Zhao C, Pritchard JK, Feldman MW. (2005) Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genet 1:660–71.

    Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivitovsky LA, Feldman MW. (2002) Genetic structure of human populations. Science 298:2381–5.[Abstract/Free Full Text]

    Takezaki N and Nei M. (1996) Genetic distances and reconstruction of phylogenetic trees from microsatellite data. Genetics 144:389–99.[Abstract]

Accepted for publication July 31, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
ANN BOT (LOND)Home page
J. A. Coyer, K. A. Miller, J. M. Engle, J. Veldsink, A. Cabello-Pasini, W. T. Stam, and J. L. Olsen
Eelgrass Meadows in the California Channel Islands and Adjacent Coast Reveal a Mosaic of Two Species, Evidence for Introgression and Variable Clonality
Ann. Bot., January 1, 2008; 101(1): 73 - 87.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
23/11/1995    most recent
msl078v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Amos, W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Amos, W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?