MBE Advance Access originally published online on July 13, 2007
Molecular Biology and Evolution 2007 24(9):2049-2058; doi:10.1093/molbev/msm135
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Dissecting Linkage Disequilibrium in African-American Genomes: Roles of Markers and Individuals

,



,1
* MOE Key Laboratory of Contemporary Anthropology and Center for Evolutionary Biology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
Chinese Academy of Science–Max-Planck-Gesellschaft Partner Institute for Computational Biology, Shanghai Institutes for Biological Science, CAS, Shanghai, China
Chinese National Human Genome Center at Shanghai, China
Rui Jin Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
|| Human Genetics Center, School of Public Health, University of Texas-Health Science Center at Houston
E-mail: ljin007{at}gmail.com.
| Abstract |
|---|
|
|
|---|
Substantial increases of linkage disequilibrium (LD) both in magnitude and in range have been observed in recently admixed populations such as African-American (AfA). On the other hand, it has also been shown that LD in AfAs was very similar to that of African. In this study, we attempted to resolve these contradicting observations by conducting a systematic examination of the LD structure in AfAs by genotyping a sample of AfA individuals at 24,341 single nucleotide polymorphisms (SNPs) spanning almost the entire chromosome 21, with an average density of 1.5 kb/SNP. The overall LD in AfAs is similar to that in African populations and much less than that in European populations. Even when the ancestry-informative markers (AIMs) were used, extended LD in AfA was found to be limited to certain magnitude range (0.2
r2
0.8) and certain distance range, that is, between-marker distance more than 200 kb. Furthermore, the inclusion of AfA individuals with predominant African ancestry was found to reduce the overall magnitude of LD. Elevation of LD in the AfA population, compared with its parental populations, can only be observed at the markers with large allele frequency differences between 2 parental populations at limited scenario. AfA individuals of wholly African ancestry contribute little to the extended LD in the AfA population, and further genotyping or association analysis conducted using only admixed individuals may lead to higher statistical power and possibly reduced cost.
Key Words: African-American linkage disequilibrium single nucleotide polymorphism ancestry-informative markers admixture population
| Introduction |
|---|
|
|
|---|
Mapping by admixture linkage disequilibrium (MALD) has received much attention recently (McKeigue 2005
Substantial increases of LD not only in magnitude but also in range have been observed in recently admixed populations such as African-American (AfA) and Mexican-American (McKeigue et al. 2000
; Pfaff et al. 2001
; Collins-Schramm et al. 2003
; Salari et al. 2005
; Smith and O'Brien 2005
; Zhu et al. 2005
). Stephens et al. (1994
) showed by simulation that admixture linkage disequilibrium (ALD) up to 10 cM could exist even after 9 generations in AfAs. Parra et al. (1998)
reported strong LD between the FY-null and AT3 loci that are 22 cM apart on chromosome 1q22. Examples of substantially extended LD in AfA populations were discovered in several regions and chromosomes primarily using short tandem repeat markers (Lautenberger et al. 2000
; Rybicki et al. 2002
; Collins-Schramm et al. 2003
). Patterson et al. (2004)
showed that strong admixture LD, on average, extends to 17 cM in AfA.
In contrast, Gabriel et al. (2002)
pointed out that LD in AfA and Africans were very similar. In addition, a number of recent studies on LD using single nucleotide polymorphisms (SNPs) have treated the AfA population as representative of African populations (Ke et al. 2004
; De La Vega et al. 2005
; Hinds et al. 2005
; Takeuchi et al. 2005
; Huang et al. 2006
). The contradicting observations may have arisen from selection of the markers based on their allele frequency difference between the parental populations, as shown theoretically (Chakraborty and Weiss 1988
). Alternatively, this conflicting viewpoint might be due to a unique genetic structure of admixed populations. In this study, we conducted a systematic dissection of the LD structure of AfA by genotyping a sample of 48 AfA individuals at 24,341 SNPs spanning almost the entire chromosome 21 with a density of 1.5 kb/SNP. We revealed that elevation of LD in AfA is dictated by the specific choice of markers and the inclusion of specific individuals. Our results implicated that both allele frequency and admixture of individuals contribute to LD architecture of AfA.
| Materials and Methods |
|---|
|
|
|---|
Populations and Samples
DNA from 48 AfAs (37 females and 11 males) was obtained from Coriell Cell Repositories. We used 60 CEU (European American samples, which were collected in 1980 from Utah residents with northern and western European ancestry by the Centre d'Etude du Polymorphisme Humain) parents and 60 YRI (African samples, Yoruba people in Ibadan, Nigeria) parents described in The International HapMap Project (2003)
Markers and Their Positions
A set of 29,177 SNPs was genotyped in 48 AfAs. Illumina Beadlab technology was used in genotyping, and the method of genotyping has been previously described elsewhere (Huang et al. 2006
). Genotyped SNPs on chromosome 21 of 60 CEU and 60 YRI samples were obtained from the Web site of the International HapMap Project (HapMap public released #19, 2005-10-24 http://www.hapmap.org). After data filtration (e.g., deleting markers with missing data >5% samples), we obtained 24,341 SNPs that were genotyped successfully in all 3 populations. Those SNPs that showed deviation from Hardy–Weinberg equilibrium were excluded using Fisher's exact test (P < 0.05), where P was estimated using Arlequin 3.01 (Schneider et al. 2000
) with 100,000 permutations.
The physical positions of SNPs were based on the Homo sapiens genome build 36. The total chromosome region studied was 36.87 Mb, from the position 10,047,705 (the first marker rs2989064) to 46,914,854 (the last marker rs8132215). The average spacing between adjacent markers was 1.5 kb with a minimum of 2 bp and a maximum of 3.1 Mb. The genetic map positions of SNPs were based on the Rutgers combined linkage-physical maps (Kong et al. 2004
), which incorporate the latest human genome assembly build 36. We determined the genetic map positions of SNPs in centimorgans using a Web-based linkage-mapping server (http://actin.ucd.ie/cgi-bin/rs2cm.cgi), which carried out a smoothing calculation to estimate genetic map positions including those markers, which have not been mapped directly. The total recombination distance is 68.17 cM (from 0 to 68.17 cM), the average intermarker distance is 0.003 cM, and the maximum is 0.38 cM.
Measures of LD
Several statistics have been used to measure the LD between a pair of loci (Jorde 1995
). The 2 most common measures are the absolute value of D' (denoted by |D'| hereafter), and r2, both derived from Lewontin's D (Lewontin 1964
). It was shown that in indirect association studies, the sample size must be increased by roughly 1/r2 when compared with the sample size for detecting association with the susceptibility locus itself (Kruglyak 1999
; Pritchard and Przeworski 2001
). The r2 has a relatively clear interpretation in terms of the power to detect an association, and intermediate values of r2 are also easily interpretable (Kruglyak 1999
; Ardlie et al. 2002
). It is therefore more sensible to use r2 to study LD. In addition, r2 also shows much less inflation in small samples than does |D'| (Ardlie et al. 2002
; Weiss and Clark 2002
).
In this study, we used r2 to measure LD between 2 SNPs. Consider 2 loci, A and B, each locus having 2 alleles (denoted A1, A2; B1, B2, respectively). We denote p11, p12, p21, and p22 as the frequencies of the haplotypes A1B1, A1B2, A2B1, and A2B2, respectively; p1+, p2+, p+1, and p+2 are the frequencies of A1, A2, B1, and B2, respectively. Following Hill and Weir (1994)
,
|
|
Marker Information Content for Ancestry
A simple measure that is informative of ancestry at a single marker is
value (Shriver et al. 1997
), defined as the sum of the absolute value of differences in all n allelic frequencies between 2 parental populations divided by 2. For biallelic SNPs, n = 2:
![]() |
Another measure, f (McKeigue 1998
), originally defined by Wahlund (1928)
, is the standardized variance of allele frequencies and ranges from 0 (noninformative) to 1 (completely informative). The f value was calculated by the following formula,
|
|
and f of each marker were calculated using the allele frequencies of CEU and YRI samples described above.
Ancestry-Informative Markers
SNPs that have large allele frequency difference between CEU and YRI were selected as AIMs according to f value (McKeigue 1998
). The SNPs in this study cover a 36.87-Mb (68.17 cM) segment of chromosome 21, which can be divided into 2 regions based on marker information. In the first region that encompasses the majority of the chromosome (35.21 Mb or 63.73 cM), we selected markers with f
0.40 from 24,341 SNPs and obtained 338 AIMs. This criterion was determined based on the analyses described in the Results of this paper. However, in the second region containing 1.66 Mb (4.44 cM), the f values of all 1,092 SNPs are below 0.35 and therefore do not meet the aforementioned criterion (f
0.40). To include this region in further analysis, we selected 6 additional SNPs with
> 0.48 (minimum f = 0.26). The final length of the chromosomal region that was covered by AIMs was 33.42 Mb (67.80 cM).
All together, we obtained a panel of 344 SNPs that are informative for ancestry, with mean f = 0.49 and mean
= 0.68. The average spacing between adjacent markers was 97.5 kb (0.19 cM), with a minimum of 69 bp (0.0 cM) and a maximum of 2.45 Mb (4.74 cM). The median distance between adjacent markers was 7.4 kb (0.01 cM).
The allele frequency distributions of 344 AIMs in 3 populations are in figure 1. Note that most alleles of AfA have moderate frequencies, whereas their corresponding allele frequencies in CEU and YRI are relatively extreme. The marker information of these 344 AIMs, measured by the standard variance (f), is shown in figure 2.
|
|
Haplotype Inference
The LD analyses in this study were based on haplotypes generated from data using PHASE 2.1 software (Stephens et al. 2001
Overall, 24,341 SNPs were broken up to sections of 40 SNPs with 20 overlapping SNPs between each 2 consecutive sections. Haplotypes were inferred by PHASE from each such section independently. Finally, the 2 haplotypes of each individual were reconstructed by combining all haplotypes section by section according to the inferred haplotypes of the 20 overlapping sites between consecutive sections. When the overlapping SNPs were inconsistent, we arbitrarily kept the results of the former section. On average, 0.75% of overlapping SNPs showed inconsistent phasing.
The individual haplotypes for the panel of 344 AIMs that were informative for ancestry were reinferred, independent of the results of 24,341 SNPs.
STRUCTURE Analysis
The program STRUCTURE (Pritchard et al. 2000
; http://pritch.bsd.uchicago.edu/software/structure2_beta.html) implements a model-based clustering method for inferring population structure using genotype data. In Version 2, the program implemented a model that allows for "ALD." In Caucasian populations, the extent of LD is generally limited to regions smaller than 100 kb (Bodmer 1986
; Laan and Paabo 1997
; Huttley et al. 1999
; Reich et al. 2001
; Gabriel et al. 2002
), which would be still shorter in Africans; therefore, the correlations that arise between linked markers in AfAs are modeled as the result of admixture or hybridization (Falush et al. 2003
). Because the program was not designed to model the LD that occurs between nearby markers (so-called "background LD") within populations (i.e., the model is best suited to data on markers that are linked but not so tightly linked), we screened the markers from the inferred haplotype data to examine for ancestral population structure. Haplotypes of 344 AIMs were inferred using the program PHASE as described above.
For STRUCTURE analysis, marker panels were selected from 344 AIMs that showed adequate genetic distances (>1 cM) between adjacent SNPs. Overall 6 panels of AIMs, all together 145 SNPs were selected for STRUCTURE analysis following the aforementioned criteria (between-marker distance [BMD] > 1 cM, see supplementary table S5 [Supplementary Material online]). Note that the AIMs are not evenly distributed on the chromosome as depicted in figure 2. There were 10 "AIM deserts" with intermarker distance over 2 cM (in order from the start, 2.54, 2.63, 4.22, 2.04, 4.74, 2.17, 4.18, 3.50, 3.98, and 3.95 cM). In all 6 panels, average f and
were >0.45 and >0.66, respectively, and for all 145 SNPs, average f = 0.47 and
= 0.67.
The haplotype data inferred by PHASE based on the 6 panels of AIMs were subject to STRUCTURE analysis. We used a linkage model option and assumed that allele frequencies are correlated. We used distances between the markers determined by both genomic sequence and recombination-based data (Rutgers combined linkage-physical maps, Kong et al. 2004
) as map distances. The program was run with 100,000 iterations, 50,000 burn-ins, and 1,000 admixture burn-ins.
| Results |
|---|
|
|
|---|
Overall LD in AfA and Its Parental Populations
The extent of LD was examined across Chromosome 21 in AfA, CEU, and YRI for all alleles (minor allele frequency, [MAF]
0; fig. 3a; supplementary table S1and S2 [Supplementary Material online]) and for common alleles (MAF
0.15; fig. 3b; supplementary table S3 and S4 [Supplementary Material online]). Proportions of marker pairs with LD at different levels of r2 (<0.1,
0.1,
0.2,
1/3,
0.5, and
0.8) were plotted against BMD. Interestingly, the admixed population AfA did not show stronger LD than CEU and YRI. In fact, for both groups of SNPs, CEU showed stronger and more extended LD at each level of r2. For example, when r2
0.8, for common alleles and BMD
200 kb, the proportion of marker pairs in CEU were 1.97- to 6.49-fold of AfA marker pairs and 1.72- to 5.69-fold of YRI marker pairs, respectively. Furthermore, the extent of LD in marker pairs of AfA is very similar to that of YRI, except that the proportion of marker pairs with the higher level LD (r2
1/3) in YRI is somewhat larger than that in AfA for common alleles, as shown in figure 3b and supplementary table S4 (Supplementary Material online). The aforementioned comparison is only meaningful if there is a correlation of the magnitude of LDs across all 3 populations at each genomic segment of the entire chromosome 21. Such a correlation is indeed supported by a plot of the average value of LD between the SNP and its nearby markers within a 50-kb window (supplementary fig. S1, Supplementary Material online). We observed a strong positional correlation of the magnitude of LDs among all 3 populations not only for the 50-kb window but also for other window sizes up to 1,000 kb (data not shown).
|
LD and Allele Frequency Difference between Parental Populations
Previous studies reported that extended LD in AfA was concealed by millions of unselected markers and that increased LD in AfA was correlated with markers' increasing allele frequency differences between the Europeans and Africans (Rybicki et al. 2002
10,
20,
100,
200,
400, and
1,000 kb) at different f levels (
0.0,
0.1,
0.2,
0.3,
0.4,
0.5,
0.6,
0.7, and
0.8). The LD in AfA and YRI, but not CEU, increases with f values (fig. 4). For example, LD in AfA correlates positively with f (e.g., Pearson
2 = 0.84, 0.97, 0.84 at 200, 400, and 1,000 kb, respectively). When f < 0.4 and for BMD
200 kb, average LD of AfA was comparable to that of YRI but smaller than that of CEU. For BMD > 200 kb, average LD in AfA was consistently stronger than those in YRI and CEU when f
0.4. Out of 24,341 SNPs, the numbers of SNPs with f larger than 0.4 and 0.5 are 338 SNPs (1.29%) and 121 SNPs (0.46%), respectively (see supplementary fig. S2, Supplementary Material online). Based on this analysis, the markers with f
0.4 are referred to as AIMs.
|
We also compared r2 in different
levels (
0.0;
0.1;
0.2;
0.3;
0.4;
0.5;
0.6;
0.7; and
0.8). Markers with different
levels showed similar patterns as those when f was used. The strength of LD in AfA depends on the
values of markers. LD in AfA is slightly higher than those in CEU and YRI for markers with
0.6, but when
0.7, extended LD in AfA is more pronounced. The similar results are consistent with the tight relationship of f and
, as shown in supplementary figure S3 (Supplementary Material online). The
0.6 roughly corresponds to f
0.4, and
0.7 roughly corresponds to f
0.5. In the following analysis, we will only use the f values as a measure of allele frequency differences.
Magnitude and Extension of LD in AfA Using AIMs
We selected 344 SNPs with large f (mean f = 0.49) between CEU and YRI (see Materials and Methods) as AIMs and compared the magnitude and extension of LD in all 3 populations using these AIMs. The r2 was calculated for each of the marker pairs (total 58,996 pairs) using the haplotypes inferred by the program PHASE (see Materials and Methods). The results are shown in supplementary figure S4 (Supplementary Material online). When only 344 AIMs are used, in general, LD in AfA is much stronger than that in CEU and YRI, as shown at the top row of supplementary figure S4 (Supplementary Material online). The difference in LD between AfA and its 2 parental populations becomes far more pronounced when BMD is capped at 5 Mb, as shown at the bottom row of supplementary figure S4 (Supplementary Material online).
To investigate the extension of LD, we compared the distributions of r2 of 58,996 marker pairs in 3 populations. The LD in AfA extends much further than those in CEU and YRI especially when 0.1 < r2 < 0.8 (fig. 5a). For example, in AfA, LD extends to 3,000 kb at r2
0.5, to 4,000 kb at r2
1/3 (the Ardlie's useful LD, [Ardlie et al. 2002
]), and to 20,000 kb at r2
0.2. In contrast, LD of r2
0.2 extends to no more than 200 kb in both CEU and YRI. Moreover, LD of r2
0.1 (corresponding to Kruglyak's useful LD, [Kruglyak 1999
]) can often be observed (0.73% of all marker pairs) at BMD > 20,000 kb in AfA, much more than those observed at the same distance in CEU (0.17%) and YRI (0.16%). In all cases, LD in YRI is the weakest among all 3 populations. Therefore, when AIMs were used, elevated LD in AfA were mostly observed in the range of 0.1
r2 < 0.8, but at BMD > 200 kb, LD in AfA increased at all levels of r2
0.1. However, the proportion of marker pairs with LD as high as r2
0.8 in AfA (<70% on average) is smaller than that in CEU, although the high LD (r2
0.8) does not extend beyond 200 kb in any of the 3 populations.
|
Admixture LD and Genetic Structure of AfA
Because admixture LD is limited to recent admixed populations, it must stem from the special genetic structure of these populations. To investigate the relationship between admixture LD and genetic structure of AfA, we inferred ancestral origins of chromosomal segments in AfA (see Materials and Methods for details and supplementary data file, Supplementary Material online). The estimated haplotypes from the 48 AfA individuals were examined together with the estimated haplotype data from the 60 CEU and 60 YRI subjects under the condition of 1, 2, 3, and 4 major populations. STRUCTURE analysis (Falush et al. 2003
0.1,
0.2,
1/3,
0.5, and
0.8) (fig. 5b). Interestingly, the MA group showed an elevated LD than the AfA group. For example, for r2
0.8, the proportion of marker pairs with BMD
200 kb (the distance where all 3 groups can be effectively compared) in MA is 1.75-fold (1.14- to 3.25-fold) of that in AfA. For r2
0.5, the proportion of marker pairs with BMD
300 kb in MA is 3.07-fold (1.06- to 14-fold) of that in AfA. For r2
1/3, the proportion of marker pairs with BMD
4,000 kb in MA is 2.45-fold (0.81- to 20-fold) of that in AfA.
|
The sample of 25 individuals with mixed ancestry revealed larger and more extended LD than those of the original AfA group and the PAA group. In contrast, the LD in PAA group was similar to that of YRI and it showed the least extended LD among 3 groups compared (i.e., PAA, AfA, and MA). It is obvious that the observed extended LD in the 48 AfA individuals was mainly contributed by the 25 individuals with mixed ancestry, whereas the 23 PAA individuals contributed little to the extended LD in AfA. To this end, we concluded that individuals with different proportions of African ancestry contributed differently to overall LD of AfA. To further demonstrate such effect and examine its impact on the LD of AfA, we conducted 2 additional experiments. In the first experiment, we began by randomly sampling one individual from the MA group with replacement and introducing the sampled individual to the original AfA sample. This process was repeated 100 times. Then for each of several BMD, we estimated overall average r2 and plotted it in figure 7. We repeated the same experiment, each time increasing the number of sampled MA individuals by one until 25 samples were reached. In the second experiment, we followed the procedures described above, sampling from the PAA group instead of the MA group until 23 samples were reached (fig. 7). The result of these 2 experiments showed that 2 groups of individuals (MA vs. PAA) contribute differently to the extended LD in AfA. In particular, LD decreases when individuals from the PAA group were added to the original AfA group (fig. 7, the left plot). In contrast, LD increases when individuals from the MA group were added to original AfA group (fig. 7, the right plot).
|
| Discussion |
|---|
|
|
|---|
In this report, we presented an analysis of LD of 24,341 markers densely distributed across the entire chromosome 21 in a sample of AfAs, European Americans, and Africans. Using all markers, we showed that LD in AfA is indeed similar to that of African and they both are weaker than that in European American. The strength of LD in AfA (measured by r2) depends on allele frequency differences between Europeans and Africans, as measured by f or
. Therefore, an admixed population such as AfA manifests extended LD only when AIMs are used but not when all SNPs are included.
Our results showed that restricting marker screening sets to markers with high African/European f or
levels is important in performing ALD mapping. For SNPs, the allele frequency differences with f
0.4 should be used as a criterion for selecting AIMs, which reveals long-range (BMD > 200 kb) LD elevation and extension in AfA. But for shorter range (BMD
200 kb), AIMs do not lead to an elevation of high level LD (r2
0.8). In fact, the inferred segment size at individual levels is far larger than 200 kb (supplementary fig. S5 and S7 [Supplementary Material online]), and regions of interest showing association signal often span multiple centimorgans as observed by Patterson et al. (2004)
. Therefore, admixed populations such as AfA offer limited utility in narrowing the size of candidate regions below 200 kb in MALD.
However, because the LD patterns below 200 kb are similar between AfA and YRI, AfA might be used as a substitute African population for LD mapping in general. When African samples are difficult to obtain for practical reasons, AfA may be more advantageous for narrowing the size of candidate regions below 200 kb for the traits or diseases that are shared between all ethnic groups because the genome of African is more fragmented than the other groups. This is probably also true for the traits and diseases that are more prevalent in Africans.
However, LD with r2
0.1 (corresponding to Kruglyak's useful LD [Kruglyak 1999
]) can often be observed (0.73% of all marker pairs) at BMD > 20,000 kb in AfA, which may include the background LD that was generated by remnant artificial association and/or stochastic fluctuation, but not true physical linkage, in AfA populations. It would be very inefficient to use the information provided by such a level of LD, and thus it, too, may not be useful for genome-wide association or for fine-scale LD mapping. In contrast, LD with r2
0.2 or higher correlates better with distance and decays more quickly, disappearing when BMD > 10,000 kb. It is therefore more efficient to locate regions of interest using LD with no less than r2
0.2, at least in AfA. Together with the above observations, efficient extended LD in AfAs can be defined as at a level of 0.2
r2
0.8 and BMD
200 kb.
Individuals with ancestry solely from one population show no crossovers between segments of different ancestry and thus contribute no additional power for admixture mapping studies, which use the method of identifying the ancestral origin of chromosomal segments (Patterson et al. 2004
). Our results indicated that these individuals also contribute little to the extended LD in AfA populations. More interestingly, analysis without these individuals leads to higher and more extended LD in the population. In practice, individuals of single ancestry can be identified easily by typing a small number of AIMs. In fact, our further analysis of the data from Hinds et al. (2005)
showed that an AfA individual who is of PAA at one chromosome is more likely to be so across the entire genome. As a possibly novel strategy for MALD, such individuals could be removed from either further genotyping or further association analysis in order to achieve higher statistical power and possibly reduced cost. This strategy, however, still requires careful investigation before implementation in future studies.
The methods of examining allelic association between markers and disease status are not very efficient for MALD (McKeigue 2000
; Seldin et al. 2004
). Instead of simply examining allelic association per se, other proposed methods (McKeigue 2000
; Seldin et al. 2004
) have the potential to maximize power by attempting to determine the ancestral identity of each chromosomal region. Unfortunately, both the association-based approach and the segment-based approach depend on the use of AIMs, which are too rare to provide good coverage of the genome. In this study, for instance, 1.29% of all markers had f
0.4 and only 0.46% had f
0.5. In the latest HapMap data (http://www.hapmap.org, HapMap Public Release #20, 2006-01-24), for all 43,739 SNPs on chromosome 21, there are 1.03% AIMs with f
0.4 and 0.38% with f
0.5. In addition, we found that AIMs do not evenly distribute in the genome in our data set; some regions lack AIMs, whereas other regions have an excess of AIMs. This phenomenon was confirmed by screening AIMs (f
0.4) from the database of the International HapMap project (phase II data, HapMap Public Release #20, 2006-01-24) (supplementary fig. S10, Supplementary Material online) in which many regions are AIM deserts. The AIMs in this study had an average intermarker distance 0.4 cM or 232 kb, but 5.8% of chromosomal segments still had ambiguous ancestries after combining the results from 6 marker panels. Notably, those ambiguous segments were distributed in the regions with very low AIM density or even no AIMs, and therefore, become the blind spots for mapping.
| Conclusion |
|---|
|
|
|---|
Overall LD of the AfA population is similar to that in Africans and even weaker than African population such as YRI for common alleles. In our study, for SNPs with a minor allele frequency higher than 0.15, LD at level of r2
1/3 in AfA samples is generally weaker than that in African samples (YRI) used in the International HapMap project.
The strength of LD in AfA depends on each locus' European/African allele frequency difference, measured by f or
. Therefore, an admixed population such as AfA will manifest extended LD only when AIMs are used. For SNPs, we recommend that the allele frequency differences with f
0.4 be used as a criterion for selecting AIMs, which reveal long-range (BMD > 200 kb) LD elevation and extension in AfA. But for shorter range (BMD
200 kb), AIMs do not lead to an elevation of high level LD (r2
0.8). Therefore, efficiently extended LD in AfAs can be defined as at the level of 0.2
r2
0.8 and extending more than 200 kb.
AfAs genetically comprise a highly structured population; those individuals of single African ancestry contribute little to the extended LD in AfA populations; and analysis without these individuals leads to higher and more extended LD in the population. As a possibly novel strategy for MALD, such individuals could be removed from either further genotyping or further association analysis in order to achieve higher statistical power with possible reduction in cost.
| Supplementary Material |
|---|
|
|
|---|
The following additional data are available with the online version of this paper. The additional data file contains further information about the applied methods and the results, including: a procedure for inferring ancestral origins of chromosomal segments in AfA; estimation of individual and population admixture proportions in AfA; estimation of admixture time of AfA; supplementary figures S1–S10; and tables S1–S7 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
This work was supported by grants from the Chinese High-Tech Program (863; 2002BA711A10), the Basic Research Program of China (973; 2004CB518605, 2002CB512900), the National Natural Science Foundation of China (NSFC30571060), and the Shanghai Science and Technology Committee (04DZ14003). The authors thank Mr Jun Zhang for discussions at the early stage of the project.
| Footnotes |
|---|
1 Present address Institute of Genetics, School of Life Sciences at Fudan University, Shanghai, China.
Naoko Takezaki, Associate Editor
| References |
|---|
|
|
|---|
Ardlie KG, Kruglyak L, Seielstad M. Patterns of linkage disequilibrium in the human genome. Nat Rev Genet (2002) 3:299–309.[CrossRef][Web of Science][Medline]
Bodmer WF. Human genetics: the molecular challenge. Cold Spring Harb Symp Quant Biol (1986) 51(Pt 1):1–13.
Chakraborty R, Weiss KM. Admixture as a tool for finding linked genes and detecting that difference from allelic association between loci. Proc Natl Acad Sci USA (1988) 85:9119–9123.
Collins-Schramm HE, Chima B, Operario DJ, Criswell LA, Seldin MF. Markers informative for ancestry demonstrate consistent megabase-length linkage disequilibrium in the African American population. Hum Genet (2003) 113:211–219.[CrossRef][Web of Science][Medline]
De La Vega FM, Isaac H, Collins A. (29 co-authors). The linkage disequilibrium maps of three human chromosomes across four populations reflect their demographic history and a common underlying recombination pattern. Genome Res (2005) 15:454–462.
Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics (2003) 164:1567–1587.
Gabriel SB, Schaffner SF, Nguyen H. (18 co-authors). The structure of haplotype blocks in the human genome. Science (2002) 296:2225–2229.
Hill WG, Weir BS. Maximum-likelihood estimation of gene location by linkage disequilibrium. Am J Hum Genet (1994) 54:705–714.[Web of Science][Medline]
Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR. Whole-genome patterns of common DNA variation in three human populations. Science (2005) 307:1072–1079.
Huang W, He Y, Wang H. (21 co-authors). Linkage disequilibrium sharing and haplotype-tagged SNP portability between populations. Proc Natl Acad Sci USA (2006) 103:1418–1421.
Huttley GA, Smith MW, Carrington M, O'Brien SJ. A scan for linkage disequilibrium across the human genome. Genetics (1999) 152:1711–1722.
Johnson GC, Esposito L, Barratt BJ. (21 co-authors). Haplotype tagging for the identification of common disease genes. Nat Genet (2001) 29:233–237.[CrossRef][Web of Science][Medline]
Jorde LB. Linkage disequilibrium as a gene-mapping tool. Am J Hum Genet (1995) 56:11–14.[Web of Science][Medline]
Ke X, Hunt S, Tapper W. (12 co-authors). The impact of SNP density on fine-scale patterns of linkage disequilibrium. Hum Mol Genet (2004) 13:577–588.
Kong X, Murphy K, Raj T, He C, White PS, Matise TC. A combined linkage-physical map of the human genome. Am J Hum Genet (2004) 75:1143–1148.[CrossRef][Web of Science][Medline]
Kruglyak L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet (1999) 22:139–144.[CrossRef][Web of Science][Medline]
Laan M, Paabo S. Demographic history and linkage disequilibrium in human populations. Nat Genet (1997) 17:435–438.[CrossRef][Web of Science][Medline]
Lautenberger JA, Stephens JC, O'Brien SJ, Smith MW. Significant admixture linkage disequilibrium across 30 cM around the FY locus in African Americans. Am J Hum Genet (2000) 66:969–978.[CrossRef][Web of Science][Medline]
Lewontin RC. The interaction of selection and linkage. II. Optimum models. Genetics (1964) 50:757–782.
McKeigue PM. Mapping genes that underlie ethnic differences in disease risk: methods for detecting linkage in admixed populations, by conditioning on parental admixture. Am J Hum Genet (1998) 63:241–251.[CrossRef][Web of Science][Medline]
McKeigue PM. Efficiency of estimation of haplotype frequencies: use of marker phenotypes of unrelated individuals versus counting of phase-known gametes. Am J Hum Genet (2000) 67:1626–1627.[CrossRef][Web of Science][Medline]
McKeigue PM. Prospects for admixture mapping of complex traits. Am J Hum Genet (2005) 76:1–7.[CrossRef][Web of Science][Medline]
McKeigue PM, Carpenter JR, Parra EJ, Shriver MD. Estimation of admixture and detection of linkage in admixed populations by a Bayesian approach: application to African-American populations. Ann Hum Genet (2000) 64:171–186.[CrossRef][Web of Science][Medline]
Parra EJ, Marcini A, Akey J. (11 co-authors). Estimating African American admixture proportions by use of population-specific alleles. Am J Hum Genet (1998) 63:1839–1851.[CrossRef][Web of Science][Medline]
Patterson N, Hattangadi N, Lane B. (12 co-authors). Methods for high-density admixture mapping of disease genes. Am J Hum Genet (2004) 74:979–1000.[CrossRef][Web of Science][Medline]
Pfaff CL, Parra EJ, Bonilla C, Hiester K, McKeigue PM, Kamboh MI, Hutchinson RG, Ferrell RE, Boerwinkle E, Shriver MD. Population structure in admixed populations: effect of admixture dynamics on the pattern of linkage disequilibrium. Am J Hum Genet (2001) 68:198–207.[CrossRef][Web of Science][Medline]
Pritchard JK, Przeworski M. Linkage disequilibrium in humans: models and data. Am J Hum Genet (2001) 69:1–14.[CrossRef][Web of Science][Medline]
Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics (2000) 155:945–959.
Reich DE, Cargill M, Bolk S. (11 co-authors). Linkage disequilibrium in the human genome. Nature (2001) 411:199–204.[CrossRef][Medline]
Rybicki BA, Iyengar SK, Harris T, Liptak R, Elston RC, Sheffer R, Chen KM, Major M, Maliarik MJ, Iannuzzi MC. The distribution of long range admixture linkage disequilibrium in an African-American population. Hum Hered (2002) 53:187–196.[CrossRef][Web of Science][Medline]
Salari K, Choudhry S, Tang H. (23 co-authors). Genetic admixture and asthma-related phenotypes in Mexican American and Puerto Rican asthmatics. Genet Epidemiol (2005) 29:76–86.[CrossRef][Web of Science][Medline]
Schneider S, Roessli D, Excoffier L. Arlequin: a software for population genetics data analysis. Version 2.000 (2000) Geneva (Switzerland): Genetics and Biometry Lab, Department of Anthropology: University of Geneva.
Seldin MF, Morii T, Collins-Schramm HE, Chima B, Kittles R, Criswell LA, Li H. Putative ancestral origins of chromosomal segments in individual African Americans: implications for admixture mapping. Genome Res (2004) 14:1076–1084.
Shriver MD, Smith MW, Jin L, Marcini A, Akey JM, Deka R, Ferrell RE. Ethnic-affiliation estimation by use of population-specific DNA markers. Am J Hum Genet (1997) 60:957–964.[Web of Science][Medline]
Smith MW, O'Brien SJ. Mapping by admixture linkage disequilibrium: advances, limitations and guidelines. Nat Rev Genet (2005) 6:623–632.[Web of Science][Medline]
Stephens JC, Briscoe D, O'Brien SJ. Mapping by admixture linkage disequilibrium in human populations: limits and guidelines. Am J Hum Genet (1994) 55:809–824.[Web of Science][Medline]
Stephens M, Donnelly P. A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet (2003) 73:1162–1169.[CrossRef][Web of Science][Medline]
Stephens M, Smith NJ, Donnelly P. A new statistical method for haplotype reconstruction from population data. Am J Hum Genet (2001) 68:978–989.[CrossRef][Web of Science][Medline]
Takeuchi F, Yanai K, Morii T, Ishinaga Y, Taniguchi-Yanai K, Nagano S, Kato N. Linkage disequilibrium grouping of single nucleotide polymorphisms (SNPs) reflecting haplotype phylogeny for efficient selection of tag SNPs. Genetics (2005) 170:291–304.
The International HapMap Consortium. The International HapMap Project. Nature (2003) 426:789–796.[CrossRef][Medline]
Wahlund S. Zusammensetzung von Populationen und Korrelationserscheinungen von Standpunkt der Vererbungslehre aus betrachtet. Hereditas (1928) 11:65–106.[Web of Science]
Weiss KM, Clark AG. Linkage disequilibrium and the mapping of complex human traits. Trends Genet (2002) 18:19–24.[CrossRef][Web of Science][Medline]
Zhu X, Luke A, Cooper RS. (11 co-authors). Admixture mapping for hypertension loci with genome-scan markers. Nat Genet (2005) 37:177–181.[CrossRef][Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Xu, W. Jin, and L. Jin Haplotype-Sharing Analysis Showing Uyghurs Are Unlikely Genetic Donors Mol. Biol. Evol., October 1, 2009; 26(10): 2197 - 2206. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








