Skip Navigation


MBE Advance Access originally published online on July 13, 2007
Molecular Biology and Evolution 2007 24(9):2049-2058; doi:10.1093/molbev/msm135
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/9/2049    most recent
msm135v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Xu, S.
Right arrow Articles by Jin, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Xu, S.
Right arrow Articles by Jin, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Dissecting Linkage Disequilibrium in African-American Genomes: Roles of Markers and Individuals

Shuhua Xu*,{dagger}, Wei Huang{ddagger},§, Haifeng Wang{ddagger}, Yungang He*,{ddagger}, Ying Wang{ddagger}, Yi Wang*, Ji Qian*, Momiao Xiong*,|| and Li Jin*,{dagger},1

* MOE Key Laboratory of Contemporary Anthropology and Center for Evolutionary Biology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
{dagger} Chinese Academy of Science–Max-Planck-Gesellschaft Partner Institute for Computational Biology, Shanghai Institutes for Biological Science, CAS, Shanghai, China
{ddagger} Chinese National Human Genome Center at Shanghai, China
§ Rui Jin Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
|| Human Genetics Center, School of Public Health, University of Texas-Health Science Center at Houston

E-mail: ljin007{at}gmail.com.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
Substantial increases of linkage disequilibrium (LD) both in magnitude and in range have been observed in recently admixed populations such as African-American (AfA). On the other hand, it has also been shown that LD in AfAs was very similar to that of African. In this study, we attempted to resolve these contradicting observations by conducting a systematic examination of the LD structure in AfAs by genotyping a sample of AfA individuals at 24,341 single nucleotide polymorphisms (SNPs) spanning almost the entire chromosome 21, with an average density of 1.5 kb/SNP. The overall LD in AfAs is similar to that in African populations and much less than that in European populations. Even when the ancestry-informative markers (AIMs) were used, extended LD in AfA was found to be limited to certain magnitude range (0.2 ≤ r2 ≤ 0.8) and certain distance range, that is, between-marker distance more than 200 kb. Furthermore, the inclusion of AfA individuals with predominant African ancestry was found to reduce the overall magnitude of LD. Elevation of LD in the AfA population, compared with its parental populations, can only be observed at the markers with large allele frequency differences between 2 parental populations at limited scenario. AfA individuals of wholly African ancestry contribute little to the extended LD in the AfA population, and further genotyping or association analysis conducted using only admixed individuals may lead to higher statistical power and possibly reduced cost.

Key Words: African-American • linkage disequilibrium • single nucleotide polymorphism • ancestry-informative markers • admixture population


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
Mapping by admixture linkage disequilibrium (MALD) has received much attention recently (McKeigue 2005Go; Smith and O'Brien 2005Go). The initial attraction of MALD was its use of significantly extended LD in admixed populations, which requires far fewer markers for mapping disease genes. Previous studies predicted that MALD typically requires only 2,000–3,000 ancestry-informative markers (AIMs) for a genome-wide search (McKeigue 1998Go; Johnson et al. 2001Go). Of course, the utility of MALD depends upon how far linkage disequilibrium (LD) actually extends over a chromosomal interval which, in turn, dictates the spacing and exact number of markers required for a genome-wide scan.

Substantial increases of LD not only in magnitude but also in range have been observed in recently admixed populations such as African-American (AfA) and Mexican-American (McKeigue et al. 2000Go; Pfaff et al. 2001Go; Collins-Schramm et al. 2003Go; Salari et al. 2005Go; Smith and O'Brien 2005Go; Zhu et al. 2005Go). Stephens et al. (1994Go) showed by simulation that admixture linkage disequilibrium (ALD) up to 10 cM could exist even after 9 generations in AfAs. Parra et al. (1998)Go reported strong LD between the FY-null and AT3 loci that are 22 cM apart on chromosome 1q22. Examples of substantially extended LD in AfA populations were discovered in several regions and chromosomes primarily using short tandem repeat markers (Lautenberger et al. 2000Go; Rybicki et al. 2002Go; Collins-Schramm et al. 2003Go). Patterson et al. (2004)Go showed that strong admixture LD, on average, extends to 17 cM in AfA.

In contrast, Gabriel et al. (2002)Go pointed out that LD in AfA and Africans were very similar. In addition, a number of recent studies on LD using single nucleotide polymorphisms (SNPs) have treated the AfA population as representative of African populations (Ke et al. 2004Go; De La Vega et al. 2005Go; Hinds et al. 2005Go; Takeuchi et al. 2005Go; Huang et al. 2006Go). The contradicting observations may have arisen from selection of the markers based on their allele frequency difference between the parental populations, as shown theoretically (Chakraborty and Weiss 1988Go). Alternatively, this conflicting viewpoint might be due to a unique genetic structure of admixed populations. In this study, we conducted a systematic dissection of the LD structure of AfA by genotyping a sample of 48 AfA individuals at 24,341 SNPs spanning almost the entire chromosome 21 with a density of 1.5 kb/SNP. We revealed that elevation of LD in AfA is dictated by the specific choice of markers and the inclusion of specific individuals. Our results implicated that both allele frequency and admixture of individuals contribute to LD architecture of AfA.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
Populations and Samples
DNA from 48 AfAs (37 females and 11 males) was obtained from Coriell Cell Repositories. We used 60 CEU (European American samples, which were collected in 1980 from Utah residents with northern and western European ancestry by the Centre d'Etude du Polymorphisme Humain) parents and 60 YRI (African samples, Yoruba people in Ibadan, Nigeria) parents described in The International HapMap Project (2003)Go as representative parental populations for ALD studies.

Markers and Their Positions
A set of 29,177 SNPs was genotyped in 48 AfAs. Illumina Beadlab technology was used in genotyping, and the method of genotyping has been previously described elsewhere (Huang et al. 2006Go). Genotyped SNPs on chromosome 21 of 60 CEU and 60 YRI samples were obtained from the Web site of the International HapMap Project (HapMap public released #19, 2005-10-24 http://www.hapmap.org). After data filtration (e.g., deleting markers with missing data >5% samples), we obtained 24,341 SNPs that were genotyped successfully in all 3 populations. Those SNPs that showed deviation from Hardy–Weinberg equilibrium were excluded using Fisher's exact test (P < 0.05), where P was estimated using Arlequin 3.01 (Schneider et al. 2000Go) with 100,000 permutations.

The physical positions of SNPs were based on the Homo sapiens genome build 36. The total chromosome region studied was 36.87 Mb, from the position 10,047,705 (the first marker rs2989064) to 46,914,854 (the last marker rs8132215). The average spacing between adjacent markers was 1.5 kb with a minimum of 2 bp and a maximum of 3.1 Mb. The genetic map positions of SNPs were based on the Rutgers combined linkage-physical maps (Kong et al. 2004Go), which incorporate the latest human genome assembly build 36. We determined the genetic map positions of SNPs in centimorgans using a Web-based linkage-mapping server (http://actin.ucd.ie/cgi-bin/rs2cm.cgi), which carried out a smoothing calculation to estimate genetic map positions including those markers, which have not been mapped directly. The total recombination distance is 68.17 cM (from 0 to 68.17 cM), the average intermarker distance is 0.003 cM, and the maximum is 0.38 cM.

Measures of LD
Several statistics have been used to measure the LD between a pair of loci (Jorde 1995Go). The 2 most common measures are the absolute value of D' (denoted by |D'| hereafter), and r2, both derived from Lewontin's D (Lewontin 1964Go). It was shown that in indirect association studies, the sample size must be increased by roughly 1/r2 when compared with the sample size for detecting association with the susceptibility locus itself (Kruglyak 1999Go; Pritchard and Przeworski 2001Go). The r2 has a relatively clear interpretation in terms of the power to detect an association, and intermediate values of r2 are also easily interpretable (Kruglyak 1999Go; Ardlie et al. 2002Go). It is therefore more sensible to use r2 to study LD. In addition, r2 also shows much less inflation in small samples than does |D'| (Ardlie et al. 2002Go; Weiss and Clark 2002Go).

In this study, we used r2 to measure LD between 2 SNPs. Consider 2 loci, A and B, each locus having 2 alleles (denoted A1, A2; B1, B2, respectively). We denote p11, p12, p21, and p22 as the frequencies of the haplotypes A1B1, A1B2, A2B1, and A2B2, respectively; p1+, p2+, p+1, and p+2 are the frequencies of A1, A2, B1, and B2, respectively. Following Hill and Weir (1994)Go,

Formula

Marker Information Content for Ancestry
A simple measure that is informative of ancestry at a single marker is {delta} value (Shriver et al. 1997Go), defined as the sum of the absolute value of differences in all n allelic frequencies between 2 parental populations divided by 2. For biallelic SNPs, n = 2:

Formula
where pi1 and pi2 are the frequencies of allele i in population 1 and population 2, respectively.

Another measure, f (McKeigue 1998Go), originally defined by Wahlund (1928)Go, is the standardized variance of allele frequencies and ranges from 0 (noninformative) to 1 (completely informative). The f value was calculated by the following formula,

Formula
In this study, both {delta} and f of each marker were calculated using the allele frequencies of CEU and YRI samples described above.

Ancestry-Informative Markers
SNPs that have large allele frequency difference between CEU and YRI were selected as AIMs according to f value (McKeigue 1998Go). The SNPs in this study cover a 36.87-Mb (68.17 cM) segment of chromosome 21, which can be divided into 2 regions based on marker information. In the first region that encompasses the majority of the chromosome (35.21 Mb or 63.73 cM), we selected markers with f ≥ 0.40 from 24,341 SNPs and obtained 338 AIMs. This criterion was determined based on the analyses described in the Results of this paper. However, in the second region containing 1.66 Mb (4.44 cM), the f values of all 1,092 SNPs are below 0.35 and therefore do not meet the aforementioned criterion (f ≥ 0.40). To include this region in further analysis, we selected 6 additional SNPs with {delta} > 0.48 (minimum f = 0.26). The final length of the chromosomal region that was covered by AIMs was 33.42 Mb (67.80 cM).

All together, we obtained a panel of 344 SNPs that are informative for ancestry, with mean f = 0.49 and mean {delta} = 0.68. The average spacing between adjacent markers was 97.5 kb (0.19 cM), with a minimum of 69 bp (0.0 cM) and a maximum of 2.45 Mb (4.74 cM). The median distance between adjacent markers was 7.4 kb (0.01 cM).

The allele frequency distributions of 344 AIMs in 3 populations are in figure 1. Note that most alleles of AfA have moderate frequencies, whereas their corresponding allele frequencies in CEU and YRI are relatively extreme. The marker information of these 344 AIMs, measured by the standard variance (f), is shown in figure 2.


Figure 1
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Allele frequency distribution of 344 AIMs in 3 samples. The alleles of AfA (diamond) are largely in moderate frequencies, whereas those of CEU (circle) and YRI (triangle) are in relatively extreme frequencies and with large difference between them.

 

Figure 2
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Marker information of 344 AIMs and genetic map of 6 marker panels. The top portion shows marker information (measured by f) of 344 AIMs. The bottom portion shows marker positions in each marker panel. The first marker was repeatedly used in all 6 marker panels, the last marker was different in 6 marker panels but with the same genetic position (68.10 cM). The positions of 145 nonoverlapping AIMs selected for estimating individual admixture and inferring ancestries of chromosomal origins and all 344 AIMs were also depicted in the last 2 lines.

 
Haplotype Inference
The LD analyses in this study were based on haplotypes generated from data using PHASE 2.1 software (Stephens et al. 2001Go; Stephens and Donnelly 2003Go; http://www.stat.washington.edu/stephens/software.html). PHASE implements a Bayesian statistical method for reconstructing haplotypes from population genotype data, which has been shown to be superior to the expectation-maximization algorithm for haplotype reconstruction at the individual level (Stephens et al. 2001Go). PHASE was run with recombination model, 10,000 iterations, 100 thinning interval, and 10,000 burn-ins. The other parameters were set as the defaults.

Overall, 24,341 SNPs were broken up to sections of 40 SNPs with 20 overlapping SNPs between each 2 consecutive sections. Haplotypes were inferred by PHASE from each such section independently. Finally, the 2 haplotypes of each individual were reconstructed by combining all haplotypes section by section according to the inferred haplotypes of the 20 overlapping sites between consecutive sections. When the overlapping SNPs were inconsistent, we arbitrarily kept the results of the former section. On average, 0.75% of overlapping SNPs showed inconsistent phasing.

The individual haplotypes for the panel of 344 AIMs that were informative for ancestry were reinferred, independent of the results of 24,341 SNPs.

STRUCTURE Analysis
The program STRUCTURE (Pritchard et al. 2000Go; http://pritch.bsd.uchicago.edu/software/structure2_beta.html) implements a model-based clustering method for inferring population structure using genotype data. In Version 2, the program implemented a model that allows for "ALD." In Caucasian populations, the extent of LD is generally limited to regions smaller than 100 kb (Bodmer 1986Go; Laan and Paabo 1997Go; Huttley et al. 1999Go; Reich et al. 2001Go; Gabriel et al. 2002Go), which would be still shorter in Africans; therefore, the correlations that arise between linked markers in AfAs are modeled as the result of admixture or hybridization (Falush et al. 2003Go). Because the program was not designed to model the LD that occurs between nearby markers (so-called "background LD") within populations (i.e., the model is best suited to data on markers that are linked but not so tightly linked), we screened the markers from the inferred haplotype data to examine for ancestral population structure. Haplotypes of 344 AIMs were inferred using the program PHASE as described above.

For STRUCTURE analysis, marker panels were selected from 344 AIMs that showed adequate genetic distances (>1 cM) between adjacent SNPs. Overall 6 panels of AIMs, all together 145 SNPs were selected for STRUCTURE analysis following the aforementioned criteria (between-marker distance [BMD] > 1 cM, see supplementary table S5 [Supplementary Material online]). Note that the AIMs are not evenly distributed on the chromosome as depicted in figure 2. There were 10 "AIM deserts" with intermarker distance over 2 cM (in order from the start, 2.54, 2.63, 4.22, 2.04, 4.74, 2.17, 4.18, 3.50, 3.98, and 3.95 cM). In all 6 panels, average f and {delta} were >0.45 and >0.66, respectively, and for all 145 SNPs, average f = 0.47 and {delta} = 0.67.

The haplotype data inferred by PHASE based on the 6 panels of AIMs were subject to STRUCTURE analysis. We used a linkage model option and assumed that allele frequencies are correlated. We used distances between the markers determined by both genomic sequence and recombination-based data (Rutgers combined linkage-physical maps, Kong et al. 2004Go) as map distances. The program was run with 100,000 iterations, 50,000 burn-ins, and 1,000 admixture burn-ins.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
Overall LD in AfA and Its Parental Populations
The extent of LD was examined across Chromosome 21 in AfA, CEU, and YRI for all alleles (minor allele frequency, [MAF] ≥ 0; fig. 3a; supplementary table S1and S2 [Supplementary Material online]) and for common alleles (MAF ≥ 0.15; fig. 3b; supplementary table S3 and S4 [Supplementary Material online]). Proportions of marker pairs with LD at different levels of r2 (<0.1, ≥0.1, ≥0.2, ≥1/3, ≥0.5, and ≥0.8) were plotted against BMD. Interestingly, the admixed population AfA did not show stronger LD than CEU and YRI. In fact, for both groups of SNPs, CEU showed stronger and more extended LD at each level of r2. For example, when r2 ≥ 0.8, for common alleles and BMD ≤ 200 kb, the proportion of marker pairs in CEU were 1.97- to 6.49-fold of AfA marker pairs and 1.72- to 5.69-fold of YRI marker pairs, respectively. Furthermore, the extent of LD in marker pairs of AfA is very similar to that of YRI, except that the proportion of marker pairs with the higher level LD (r2 ≥ 1/3) in YRI is somewhat larger than that in AfA for common alleles, as shown in figure 3b and supplementary table S4 (Supplementary Material online). The aforementioned comparison is only meaningful if there is a correlation of the magnitude of LDs across all 3 populations at each genomic segment of the entire chromosome 21. Such a correlation is indeed supported by a plot of the average value of LD between the SNP and its nearby markers within a 50-kb window (supplementary fig. S1, Supplementary Material online). We observed a strong positional correlation of the magnitude of LDs among all 3 populations not only for the 50-kb window but also for other window sizes up to 1,000 kb (data not shown).


Figure 3
View larger version (55K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Comparison of fraction of marker pairs with different r2 levels (≥0.1, ≥0.2, ≥1/3, ≥0.5, and ≥0.8, depicted by different colors) in 3 populations for all 24,341 SNPs. (a) marker pairs of all 24,341 SNPs with MAF ≥ 0; (b) consider only marker pairs with MAF ≥ 0.15.

 
LD and Allele Frequency Difference between Parental Populations
Previous studies reported that extended LD in AfA was concealed by millions of unselected markers and that increased LD in AfA was correlated with markers' increasing allele frequency differences between the Europeans and Africans (Rybicki et al. 2002Go; Collins-Schramm et al. 2003Go). Allele frequency difference between the 2 parental populations is measured by standard variance f, defined following Mckeigue (1998Go; see Materials and Methods). We calculated LD (measured by r2) using nearly 300 million pairs of SNPs with various BMD (≤10, ≤20, ≤100, ≤200, ≤400, and ≤1,000 kb) at different f levels (≥0.0, ≥0.1, ≥0.2, ≥0.3, ≥0.4, ≥0.5, ≥0.6, ≥0.7, and ≥0.8). The LD in AfA and YRI, but not CEU, increases with f values (fig. 4). For example, LD in AfA correlates positively with f (e.g., Pearson {rho}2 = 0.84, 0.97, 0.84 at 200, 400, and 1,000 kb, respectively). When f < 0.4 and for BMD ≤ 200 kb, average LD of AfA was comparable to that of YRI but smaller than that of CEU. For BMD > 200 kb, average LD in AfA was consistently stronger than those in YRI and CEU when f ≥ 0.4. Out of 24,341 SNPs, the numbers of SNPs with f larger than 0.4 and 0.5 are 338 SNPs (1.29%) and 121 SNPs (0.46%), respectively (see supplementary fig. S2, Supplementary Material online). Based on this analysis, the markers with f ≥ 0.4 are referred to as AIMs.


Figure 4
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— Relationship of LD with f value of markers. Average of LD in AfA of each SNP with its nearby markers within a given distance interval (10, 20, 100, 200, 400, and 1,000 kb) is plotted against the f value of the SNP.

 
We also compared r2 in different {delta} levels (≥0.0; ≥0.1; ≥0.2; ≥0.3; ≥0.4; ≥0.5; ≥0.6; ≥0.7; and ≥0.8). Markers with different {delta} levels showed similar patterns as those when f was used. The strength of LD in AfA depends on the {delta} values of markers. LD in AfA is slightly higher than those in CEU and YRI for markers with {delta} ≥ 0.6, but when {delta} ≥ 0.7, extended LD in AfA is more pronounced. The similar results are consistent with the tight relationship of f and {delta}, as shown in supplementary figure S3 (Supplementary Material online). The {delta} ≥ 0.6 roughly corresponds to f ≥ 0.4, and {delta} ≥ 0.7 roughly corresponds to f ≥ 0.5. In the following analysis, we will only use the f values as a measure of allele frequency differences.

Magnitude and Extension of LD in AfA Using AIMs
We selected 344 SNPs with large f (mean f = 0.49) between CEU and YRI (see Materials and Methods) as AIMs and compared the magnitude and extension of LD in all 3 populations using these AIMs. The r2 was calculated for each of the marker pairs (total 58,996 pairs) using the haplotypes inferred by the program PHASE (see Materials and Methods). The results are shown in supplementary figure S4 (Supplementary Material online). When only 344 AIMs are used, in general, LD in AfA is much stronger than that in CEU and YRI, as shown at the top row of supplementary figure S4 (Supplementary Material online). The difference in LD between AfA and its 2 parental populations becomes far more pronounced when BMD is capped at 5 Mb, as shown at the bottom row of supplementary figure S4 (Supplementary Material online).

To investigate the extension of LD, we compared the distributions of r2 of 58,996 marker pairs in 3 populations. The LD in AfA extends much further than those in CEU and YRI especially when 0.1 < r2 < 0.8 (fig. 5a). For example, in AfA, LD extends to 3,000 kb at r2 ≥ 0.5, to 4,000 kb at r2 ≥ 1/3 (the Ardlie's useful LD, [Ardlie et al. 2002Go]), and to 20,000 kb at r2 ≥ 0.2. In contrast, LD of r2 ≥ 0.2 extends to no more than 200 kb in both CEU and YRI. Moreover, LD of r2 ≥ 0.1 (corresponding to Kruglyak's useful LD, [Kruglyak 1999Go]) can often be observed (0.73% of all marker pairs) at BMD > 20,000 kb in AfA, much more than those observed at the same distance in CEU (0.17%) and YRI (0.16%). In all cases, LD in YRI is the weakest among all 3 populations. Therefore, when AIMs were used, elevated LD in AfA were mostly observed in the range of 0.1 ≤ r2 < 0.8, but at BMD > 200 kb, LD in AfA increased at all levels of r2 ≥ 0.1. However, the proportion of marker pairs with LD as high as r2 ≥ 0.8 in AfA (<70% on average) is smaller than that in CEU, although the high LD (r2 ≥ 0.8) does not extend beyond 200 kb in any of the 3 populations.


Figure 5
View larger version (78K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 5.— Comparison of proportion of marker pairs of different r2 levels in (a) AfA and its parental populations and (b) AfA and its 2 subsamples for 344 AIMs.

 
Admixture LD and Genetic Structure of AfA
Because admixture LD is limited to recent admixed populations, it must stem from the special genetic structure of these populations. To investigate the relationship between admixture LD and genetic structure of AfA, we inferred ancestral origins of chromosomal segments in AfA (see Materials and Methods for details and supplementary data file, Supplementary Material online). The estimated haplotypes from the 48 AfA individuals were examined together with the estimated haplotype data from the 60 CEU and 60 YRI subjects under the condition of 1, 2, 3, and 4 major populations. STRUCTURE analysis (Falush et al. 2003Go) showed that a 2-population model best fitted the data (supplementary table S5, Supplementary Material online). Under the assumption that African and European were the only 2 parental populations, the STRUCTURE provided the probability of an allele being derived from either the African cluster or the European cluster and information on the ancestry of the chromosome segments for each individual was then obtained. As expected, the AfA haplotypes showed contributions from both parental populations (supplementary fig. S5, Supplementary Material online). The contribution from African was much greater than that from European in AfA. The mean contribution of ancestry was average 80.2% from African and 19.8% from European (see supplementary fig. S8 and S9, Supplementary Material online). There were 23 AfA individuals whose putative ancestry was nearly purely African (African-American individual with pure African ancestry [PAA]), and the other 25 AfA individuals had various proportions of both ancestries (African-American individual with mixed ancestries [MA]). As shown in figure 6, we examined the pairwise LD (measured by r2) independently in 2 groups separately, using 344 AIMs. This result was compared with that including all 48 AfA individuals (Original AfA sample). The level of LD of 3 groups was compared at different r2 levels (≥0.1, ≥0.2, ≥1/3, ≥0.5, and ≥0.8) (fig. 5b). Interestingly, the MA group showed an elevated LD than the AfA group. For example, for r2 ≥ 0.8, the proportion of marker pairs with BMD ≤ 200 kb (the distance where all 3 groups can be effectively compared) in MA is 1.75-fold (1.14- to 3.25-fold) of that in AfA. For r2 ≥ 0.5, the proportion of marker pairs with BMD ≤ 300 kb in MA is 3.07-fold (1.06- to 14-fold) of that in AfA. For r2 ≥ 1/3, the proportion of marker pairs with BMD ≤ 4,000 kb in MA is 2.45-fold (0.81- to 20-fold) of that in AfA.


Figure 6
View larger version (33K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 6.— Inferred ancestral origins of a 64.65-cM segment of chromosome 21. Chromosome pairs are depicted with spaces between individual subjects. Forty-eight AfA individuals (left plot) are grouped into 2 subgroups according to their African ancestry proportions. Twenty-three AfA individuals with identified ancestry almost solely from African population are grouped as PAA group (right plot); the rest 25 AfA individuals with various proportions of both ancestries are grouped as MA group (middle plot).

 
The sample of 25 individuals with mixed ancestry revealed larger and more extended LD than those of the original AfA group and the PAA group. In contrast, the LD in PAA group was similar to that of YRI and it showed the least extended LD among 3 groups compared (i.e., PAA, AfA, and MA). It is obvious that the observed extended LD in the 48 AfA individuals was mainly contributed by the 25 individuals with mixed ancestry, whereas the 23 PAA individuals contributed little to the extended LD in AfA. To this end, we concluded that individuals with different proportions of African ancestry contributed differently to overall LD of AfA. To further demonstrate such effect and examine its impact on the LD of AfA, we conducted 2 additional experiments. In the first experiment, we began by randomly sampling one individual from the MA group with replacement and introducing the sampled individual to the original AfA sample. This process was repeated 100 times. Then for each of several BMD, we estimated overall average r2 and plotted it in figure 7. We repeated the same experiment, each time increasing the number of sampled MA individuals by one until 25 samples were reached. In the second experiment, we followed the procedures described above, sampling from the PAA group instead of the MA group until 23 samples were reached (fig. 7). The result of these 2 experiments showed that 2 groups of individuals (MA vs. PAA) contribute differently to the extended LD in AfA. In particular, LD decreases when individuals from the PAA group were added to the original AfA group (fig. 7, the left plot). In contrast, LD increases when individuals from the MA group were added to original AfA group (fig. 7, the right plot).


Figure 7
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 7.— Individuals with different proportions of African ancestry have obvious different effects on LD. The effects were examined by adding 2 groups of individuals gradually to the original AfA sample, and each number of individuals was randomly sampled for 100 times. Such sampling and adding work were done many cycles according to the total number of individuals in each group, r2 of each marker pair was calculated each time and averaged in the end of each cycle. For the group of individuals with ancestry dominantly from African, 23 cycles were done; for the group of individuals with mixed ancestries, 25 cycles were done. LD consistently decreases while individuals with ancestries dominantly from African were added to original AfA sample (the left plot); In contrast, LD consistently increases while individuals with MA were added to original AfA sample (the right plot).

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
In this report, we presented an analysis of LD of 24,341 markers densely distributed across the entire chromosome 21 in a sample of AfAs, European Americans, and Africans. Using all markers, we showed that LD in AfA is indeed similar to that of African and they both are weaker than that in European American. The strength of LD in AfA (measured by r2) depends on allele frequency differences between Europeans and Africans, as measured by f or {delta}. Therefore, an admixed population such as AfA manifests extended LD only when AIMs are used but not when all SNPs are included.

Our results showed that restricting marker screening sets to markers with high African/European f or {delta} levels is important in performing ALD mapping. For SNPs, the allele frequency differences with f ≥ 0.4 should be used as a criterion for selecting AIMs, which reveals long-range (BMD > 200 kb) LD elevation and extension in AfA. But for shorter range (BMD ≤ 200 kb), AIMs do not lead to an elevation of high level LD (r2 ≥ 0.8). In fact, the inferred segment size at individual levels is far larger than 200 kb (supplementary fig. S5 and S7 [Supplementary Material online]), and regions of interest showing association signal often span multiple centimorgans as observed by Patterson et al. (2004)Go. Therefore, admixed populations such as AfA offer limited utility in narrowing the size of candidate regions below 200 kb in MALD.

However, because the LD patterns below 200 kb are similar between AfA and YRI, AfA might be used as a substitute African population for LD mapping in general. When African samples are difficult to obtain for practical reasons, AfA may be more advantageous for narrowing the size of candidate regions below 200 kb for the traits or diseases that are shared between all ethnic groups because the genome of African is more fragmented than the other groups. This is probably also true for the traits and diseases that are more prevalent in Africans.

However, LD with r2 ≥ 0.1 (corresponding to Kruglyak's useful LD [Kruglyak 1999Go]) can often be observed (0.73% of all marker pairs) at BMD > 20,000 kb in AfA, which may include the background LD that was generated by remnant artificial association and/or stochastic fluctuation, but not true physical linkage, in AfA populations. It would be very inefficient to use the information provided by such a level of LD, and thus it, too, may not be useful for genome-wide association or for fine-scale LD mapping. In contrast, LD with r2 ≥ 0.2 or higher correlates better with distance and decays more quickly, disappearing when BMD > 10,000 kb. It is therefore more efficient to locate regions of interest using LD with no less than r2 ≥ 0.2, at least in AfA. Together with the above observations, efficient extended LD in AfAs can be defined as at a level of 0.2 ≤ r2 ≤ 0.8 and BMD ≤ 200 kb.

Individuals with ancestry solely from one population show no crossovers between segments of different ancestry and thus contribute no additional power for admixture mapping studies, which use the method of identifying the ancestral origin of chromosomal segments (Patterson et al. 2004Go). Our results indicated that these individuals also contribute little to the extended LD in AfA populations. More interestingly, analysis without these individuals leads to higher and more extended LD in the population. In practice, individuals of single ancestry can be identified easily by typing a small number of AIMs. In fact, our further analysis of the data from Hinds et al. (2005)Go showed that an AfA individual who is of PAA at one chromosome is more likely to be so across the entire genome. As a possibly novel strategy for MALD, such individuals could be removed from either further genotyping or further association analysis in order to achieve higher statistical power and possibly reduced cost. This strategy, however, still requires careful investigation before implementation in future studies.

The methods of examining allelic association between markers and disease status are not very efficient for MALD (McKeigue 2000Go; Seldin et al. 2004Go). Instead of simply examining allelic association per se, other proposed methods (McKeigue 2000Go; Seldin et al. 2004Go) have the potential to maximize power by attempting to determine the ancestral identity of each chromosomal region. Unfortunately, both the association-based approach and the segment-based approach depend on the use of AIMs, which are too rare to provide good coverage of the genome. In this study, for instance, 1.29% of all markers had f ≥ 0.4 and only 0.46% had f ≥ 0.5. In the latest HapMap data (http://www.hapmap.org, HapMap Public Release #20, 2006-01-24), for all 43,739 SNPs on chromosome 21, there are 1.03% AIMs with f ≥ 0.4 and 0.38% with f ≥ 0.5. In addition, we found that AIMs do not evenly distribute in the genome in our data set; some regions lack AIMs, whereas other regions have an excess of AIMs. This phenomenon was confirmed by screening AIMs (f ≥ 0.4) from the database of the International HapMap project (phase II data, HapMap Public Release #20, 2006-01-24) (supplementary fig. S10, Supplementary Material online) in which many regions are AIM deserts. The AIMs in this study had an average intermarker distance 0.4 cM or 232 kb, but 5.8% of chromosomal segments still had ambiguous ancestries after combining the results from 6 marker panels. Notably, those ambiguous segments were distributed in the regions with very low AIM density or even no AIMs, and therefore, become the blind spots for mapping.


    Conclusion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
Overall LD of the AfA population is similar to that in Africans and even weaker than African population such as YRI for common alleles. In our study, for SNPs with a minor allele frequency higher than 0.15, LD at level of r2 ≥ 1/3 in AfA samples is generally weaker than that in African samples (YRI) used in the International HapMap project.

The strength of LD in AfA depends on each locus' European/African allele frequency difference, measured by f or {delta}. Therefore, an admixed population such as AfA will manifest extended LD only when AIMs are used. For SNPs, we recommend that the allele frequency differences with f ≥ 0.4 be used as a criterion for selecting AIMs, which reveal long-range (BMD > 200 kb) LD elevation and extension in AfA. But for shorter range (BMD ≤ 200 kb), AIMs do not lead to an elevation of high level LD (r2 ≥ 0.8). Therefore, efficiently extended LD in AfAs can be defined as at the level of 0.2 ≤ r2 ≤ 0.8 and extending more than 200 kb.

AfAs genetically comprise a highly structured population; those individuals of single African ancestry contribute little to the extended LD in AfA populations; and analysis without these individuals leads to higher and more extended LD in the population. As a possibly novel strategy for MALD, such individuals could be removed from either further genotyping or further association analysis in order to achieve higher statistical power with possible reduction in cost.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
The following additional data are available with the online version of this paper. The additional data file contains further information about the applied methods and the results, including: a procedure for inferring ancestral origins of chromosomal segments in AfA; estimation of individual and population admixture proportions in AfA; estimation of admixture time of AfA; supplementary figures S1–S10; and tables S1–S7 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
This work was supported by grants from the Chinese High-Tech Program (863; 2002BA711A10), the Basic Research Program of China (973; 2004CB518605, 2002CB512900), the National Natural Science Foundation of China (NSFC30571060), and the Shanghai Science and Technology Committee (04DZ14003). The authors thank Mr Jun Zhang for discussions at the early stage of the project.


    Footnotes
 
1 Present address Institute of Genetics, School of Life Sciences at Fudan University, Shanghai, China. Back

Naoko Takezaki, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 

    Ardlie KG, Kruglyak L, Seielstad M. Patterns of linkage disequilibrium in the human genome. Nat Rev Genet (2002) 3:299–309.[CrossRef][Web of Science][Medline]

    Bodmer WF. Human genetics: the molecular challenge. Cold Spring Harb Symp Quant Biol (1986) 51(Pt 1):1–13.[Abstract/Free Full Text]

    Chakraborty R, Weiss KM. Admixture as a tool for finding linked genes and detecting that difference from allelic association between loci. Proc Natl Acad Sci USA (1988) 85:9119–9123.[Abstract/Free Full Text]

    Collins-Schramm HE, Chima B, Operario DJ, Criswell LA, Seldin MF. Markers informative for ancestry demonstrate consistent megabase-length linkage disequilibrium in the African American population. Hum Genet (2003) 113:211–219.[CrossRef][Web of Science][Medline]

    De La Vega FM, Isaac H, Collins A. (29 co-authors). The linkage disequilibrium maps of three human chromosomes across four populations reflect their demographic history and a common underlying recombination pattern. Genome Res (2005) 15:454–462.[Abstract/Free Full Text]

    Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics (2003) 164:1567–1587.[Abstract/Free Full Text]

    Gabriel SB, Schaffner SF, Nguyen H. (18 co-authors). The structure of haplotype blocks in the human genome. Science (2002) 296:2225–2229.[Abstract/Free Full Text]

    Hill WG, Weir BS. Maximum-likelihood estimation of gene location by linkage disequilibrium. Am J Hum Genet (1994) 54:705–714.[Web of Science][Medline]

    Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR. Whole-genome patterns of common DNA variation in three human populations. Science (2005) 307:1072–1079.[Abstract/Free Full Text]

    Huang W, He Y, Wang H. (21 co-authors). Linkage disequilibrium sharing and haplotype-tagged SNP portability between populations. Proc Natl Acad Sci USA (2006) 103:1418–1421.[Abstract/Free Full Text]

    Huttley GA, Smith MW, Carrington M, O'Brien SJ. A scan for linkage disequilibrium across the human genome. Genetics (1999) 152:1711–1722.[Abstract/Free Full Text]

    Johnson GC, Esposito L, Barratt BJ. (21 co-authors). Haplotype tagging for the identification of common disease genes. Nat Genet (2001) 29:233–237.[CrossRef][Web of Science][Medline]

    Jorde LB. Linkage disequilibrium as a gene-mapping tool. Am J Hum Genet (1995) 56:11–14.[Web of Science][Medline]

    Ke X, Hunt S, Tapper W. (12 co-authors). The impact of SNP density on fine-scale patterns of linkage disequilibrium. Hum Mol Genet (2004) 13:577–588.[Abstract/Free Full Text]

    Kong X, Murphy K, Raj T, He C, White PS, Matise TC. A combined linkage-physical map of the human genome. Am J Hum Genet (2004) 75:1143–1148.[CrossRef][Web of Science][Medline]

    Kruglyak L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet (1999) 22:139–144.[CrossRef][Web of Science][Medline]

    Laan M, Paabo S. Demographic history and linkage disequilibrium in human populations. Nat Genet (1997) 17:435–438.[CrossRef][Web of Science][Medline]

    Lautenberger JA, Stephens JC, O'Brien SJ, Smith MW. Significant admixture linkage disequilibrium across 30 cM around the FY locus in African Americans. Am J Hum Genet (2000) 66:969–978.[CrossRef][Web of Science][Medline]

    Lewontin RC. The interaction of selection and linkage. II. Optimum models. Genetics (1964) 50:757–782.[Free Full Text]

    McKeigue PM. Mapping genes that underlie ethnic differences in disease risk: methods for detecting linkage in admixed populations, by conditioning on parental admixture. Am J Hum Genet (1998) 63:241–251.[CrossRef][Web of Science][Medline]

    McKeigue PM. Efficiency of estimation of haplotype frequencies: use of marker phenotypes of unrelated individuals versus counting of phase-known gametes. Am J Hum Genet (2000) 67:1626–1627.[CrossRef][Web of Science][Medline]

    McKeigue PM. Prospects for admixture mapping of complex traits. Am J Hum Genet (2005) 76:1–7.[CrossRef][Web of Science][Medline]

    McKeigue PM, Carpenter JR, Parra EJ, Shriver MD. Estimation of admixture and detection of linkage in admixed populations by a Bayesian approach: application to African-American populations. Ann Hum Genet (2000) 64:171–186.[CrossRef][Web of Science][Medline]

    Parra EJ, Marcini A, Akey J. (11 co-authors). Estimating African American admixture proportions by use of population-specific alleles. Am J Hum Genet (1998) 63:1839–1851.[CrossRef][Web of Science][Medline]

    Patterson N, Hattangadi N, Lane B. (12 co-authors). Methods for high-density admixture mapping of disease genes. Am J Hum Genet (2004) 74:979–1000.[CrossRef][Web of Science][Medline]

    Pfaff CL, Parra EJ, Bonilla C, Hiester K, McKeigue PM, Kamboh MI, Hutchinson RG, Ferrell RE, Boerwinkle E, Shriver MD. Population structure in admixed populations: effect of admixture dynamics on the pattern of linkage disequilibrium. Am J Hum Genet (2001) 68:198–207.[CrossRef][Web of Science][Medline]

    Pritchard JK, Przeworski M. Linkage disequilibrium in humans: models and data. Am J Hum Genet (2001) 69:1–14.[CrossRef][Web of Science][Medline]

    Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics (2000) 155:945–959.[Abstract/Free Full Text]

    Reich DE, Cargill M, Bolk S. (11 co-authors). Linkage disequilibrium in the human genome. Nature (2001) 411:199–204.[CrossRef][Medline]

    Rybicki BA, Iyengar SK, Harris T, Liptak R, Elston RC, Sheffer R, Chen KM, Major M, Maliarik MJ, Iannuzzi MC. The distribution of long range admixture linkage disequilibrium in an African-American population. Hum Hered (2002) 53:187–196.[CrossRef][Web of Science][Medline]

    Salari K, Choudhry S, Tang H. (23 co-authors). Genetic admixture and asthma-related phenotypes in Mexican American and Puerto Rican asthmatics. Genet Epidemiol (2005) 29:76–86.[CrossRef][Web of Science][Medline]

    Schneider S, Roessli D, Excoffier L. Arlequin: a software for population genetics data analysis. Version 2.000 (2000) Geneva (Switzerland): Genetics and Biometry Lab, Department of Anthropology: University of Geneva.

    Seldin MF, Morii T, Collins-Schramm HE, Chima B, Kittles R, Criswell LA, Li H. Putative ancestral origins of chromosomal segments in individual African Americans: implications for admixture mapping. Genome Res (2004) 14:1076–1084.[Abstract/Free Full Text]

    Shriver MD, Smith MW, Jin L, Marcini A, Akey JM, Deka R, Ferrell RE. Ethnic-affiliation estimation by use of population-specific DNA markers. Am J Hum Genet (1997) 60:957–964.[Web of Science][Medline]

    Smith MW, O'Brien SJ. Mapping by admixture linkage disequilibrium: advances, limitations and guidelines. Nat Rev Genet (2005) 6:623–632.[Web of Science][Medline]

    Stephens JC, Briscoe D, O'Brien SJ. Mapping by admixture linkage disequilibrium in human populations: limits and guidelines. Am J Hum Genet (1994) 55:809–824.[Web of Science][Medline]

    Stephens M, Donnelly P. A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet (2003) 73:1162–1169.[CrossRef][Web of Science][Medline]

    Stephens M, Smith NJ, Donnelly P. A new statistical method for haplotype reconstruction from population data. Am J Hum Genet (2001) 68:978–989.[CrossRef][Web of Science][Medline]

    Takeuchi F, Yanai K, Morii T, Ishinaga Y, Taniguchi-Yanai K, Nagano S, Kato N. Linkage disequilibrium grouping of single nucleotide polymorphisms (SNPs) reflecting haplotype phylogeny for efficient selection of tag SNPs. Genetics (2005) 170:291–304.[Abstract/Free Full Text]

    The International HapMap Consortium. The International HapMap Project. Nature (2003) 426:789–796.[CrossRef][Medline]

    Wahlund S. Zusammensetzung von Populationen und Korrelationserscheinungen von Standpunkt der Vererbungslehre aus betrachtet. Hereditas (1928) 11:65–106.[Web of Science]

    Weiss KM, Clark AG. Linkage disequilibrium and the mapping of complex human traits. Trends Genet (2002) 18:19–24.[CrossRef][Web of Science][Medline]

    Zhu X, Luke A, Cooper RS. (11 co-authors). Admixture mapping for hypertension loci with genome-scan markers. Nat Genet (2005) 37:177–181.[CrossRef][Web of Science][Medline]

Accepted for publication June 25, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
S. Xu, W. Jin, and L. Jin
Haplotype-Sharing Analysis Showing Uyghurs Are Unlikely Genetic Donors
Mol. Biol. Evol., October 1, 2009; 26(10): 2197 - 2206.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/9/2049    most recent
msm135v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Xu, S.
Right arrow Articles by Jin, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Xu, S.
Right arrow Articles by Jin, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?