Skip Navigation


MBE Advance Access originally published online on July 3, 2006
Molecular Biology and Evolution 2006 23(10):1891-1901; doi:10.1093/molbev/msl051
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrowOA All Versions of this Article:
23/10/1891    most recent
msl051v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Kosakovsky Pond, S. L.
Right arrow Articles by Frost, S. D. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kosakovsky Pond, S. L.
Right arrow Articles by Frost, S. D. W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 The Authors This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Research Article

Automated Phylogenetic Detection of Recombination Using a Genetic Algorithm

Sergei L. Kosakovsky Pond*, David Posada{dagger}, Michael B. Gravenor{ddagger}, Christopher H. Woelk* and Simon D. W. Frost*

* Department of Pathology, University of California San Diego; {dagger} University of Vigo, Vigo, Spain; and {ddagger} School of Medicine, University of Swansea, Swansea, Wales, United Kingdom

E-mail: spond{at}ucsd.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The evolution of homologous sequences affected by recombination or gene conversion cannot be adequately explained by a single phylogenetic tree. Many tree-based methods for sequence analysis, for example, those used for detecting sites evolving nonneutrally, have been shown to fail if such phylogenetic incongruity is ignored. However, it may be possible to propose several phylogenies that can correctly model the evolution of nonrecombinant fragments. We propose a model-based framework that uses a genetic algorithm to search a multiple-sequence alignment for putative recombination break points, quantifies the level of support for their locations, and identifies sequences or clades involved in putative recombination events. The software implementation can be run quickly and efficiently in a distributed computing environment, and various components of the methods can be chosen for computational expediency or statistical rigor. We evaluate the performance of the new method on simulated alignments and on an array of published benchmark data sets. Finally, we demonstrate that prescreening alignments with our method allows one to analyze recombinant sequences for positive selection.

Key Words: recombination • phylogenetic incongruence • model selection • genetic algorithms • multimodel inference


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Point mutation and recombination are 2 major evolutionary mechanisms driving diversity and adaptation, but their relative contribution varies greatly between genes and organisms (Worobey and Holmes 1999Go; Awadalla 2003Go). In many retroviruses, for example, HIV, the rate of recombination may rival or exceed that of point mutation (Zhuang et al. 2002Go). Conversely, although it has long been thought that animal mitochondrial genomes exhibit low recombination rates, a recent study (Tsaousis et al. 2005Go) presents evidence to the contrary.

Over the last 2 decades, there has been an explosion of stochastic models for studying evolution due to point mutations (Felsenstein 1981Go; Muse and Gaut 1994Go; Felsenstein and Churchill 1996Go; Savill et al. 2001Go). These advances led to rapid development of popular phylogeny-based inference methods, such as ancestral dating based on molecular clocks (Korber et al. 2000Go) and methods for detecting nonneutral evolution at the level of individual codons (Nielsen and Yang 1998Go; Suzuki and Gojobori 1999Go; Kosakovsky Pond and Frost 2005bGo). Recombination can mislead the phylogenetic estimation procedure (Posada and Crandall 2002Go) and distort subsequent inferences based on inferred phylogenies (Schierup and Hein 2000aGo, 2000bGo). Furthermore, the likelihood methods for quantifying selection pressure on codon alignments developed by Nielsen and Yang (1998)Go may suffer from high rates of false positives when the sequences being analyzed have undergone recombination (Anisimova et al. 2003Go; Shriner et al. 2003Go). This is intuitively clear because the evolution of homologous recombinant sequences must be modeled by several phylogenies—one for each nonrecombinant fragment in the alignment. Consequently, an essential step in any phylogeny-based analysis is to screen for and quantify evidence of recombination.

There exist numerous algorithms and software tools geared toward detection and analysis of recombination. Posada and colleagues have compared the performance of different methods on simulated (Posada and Crandall 2001Go) and biological data (Posada 2002Go) and found that they can yield vastly different results, necessitating the use of a consensus method approach to obtain reliable inference. However, it may be excessively laborious and not necessarily illuminating to apply multiple methods to an alignment and attempt to integrate the results. For instance, some methods may claim that an alignment is recombination free, whereas others find many recombination events. Moreover, when the goal is not only to merely test for recombination, but also to identify break points and recombinant sequences and to establish statistical support for the inferences, many methods are incapable of this level of detail. Lastly, when the ultimate objective is to apply a phylogeny-based method to a data set with evidence of recombination, it is imperative to determine which segments of the alignment are nonrecombinant and infer an appropriate phylogeny for each segment. This procedure may be preferable to discarding sequences that may have undergone recombination or assuming a single phylogenetic history for the entire alignment.

With this in mind, we propose a pragmatic approach—Genetic Algorithm Recombination Detection—or GARD for short, to rapidly screen multiple-sequence alignments for recombination. The method is designed from the outset to search for evidence of segment-specific phylogenies. Given the maximum number of break points (B, this number can also be inferred), the method will search the space of all possible locations for B or fewer break points in the alignment, inferring phylogenies for each putative nonrecombinant fragment, and assess goodness of fit by an information-based criterion—such as small sample Akaike Information Criterion (AIC) (Sugiura 1978Go) (AICc)—derived from a maximum likelihood model fit to each segment. For B = 1, it is practical to quickly screen all possible locations of the break point. This simple approach is shown to perform at least as well as any of the 14 methods examined by Posada and Crandall (2001)Go on the same simulated data. When B > 1, it is often infeasible to perform a "brute-force" search for long sequences. We propose a genetic algorithm (GA) heuristic to quickly explore such a large-state space. Drawing upon the standards of multimodel inference, we combine the information from all fitted models and assign a level of support to the placement of break points and support for different phylogenies among inferred nonrecombinant segments. We reanalyze 2 collections of previously published biological sequence alignments (Posada 2002Go; Chare et al. 2003Go) and compare our findings with those presented originally. Based on several simulation scenarios, we show that GARD has good power and accuracy to detect recombination and identify nonrecombinant sequence fragments. Lastly, we demonstrate how screening for nonrecombinant sequence fragments helps reduce false-positive error rates in the fixed effects likelihood (FEL) phylogenetic method (Kosakovsky Pond and Frost 2005bGo) used for selection analysis.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Rapid Screening for Recombination Using a Single Break Point
Consider an alignment of S sequences with N characters each. Sequences could consist of nucleotides, amino acids, codons, or characters from other alphabets; however, in this article, we apply the GARD method to nucleotide data. If none of the sequences are recombinant, a single phylogeny should fit the data well; otherwise, different regions of sequences may yield different phylogenies if analyzed separately. We assume that the process of point mutation can be adequately modeled by an appropriate time-reversible model of nucleotide substitution, up to the general reversible model (Tavaré 1986Go), with site-to-site rate variation accounted for with the ß–{Gamma} distribution (Kosakovsky Pond and Frost 2005cGo). Because we can only resolve the location of break points up to the nearest variable site, we adopt the convention that break points must coincide with variable sites, whose number is denoted by V isin [2, N]. Actual break points may reside somewhere between variable sites, but phylogenetic search procedures cannot be used to identify exactly where because invariable sites contain no phylogenetic signal. For computational expediency, we employ the Neighbor-Joining (NJ) method (Saitou and Nei 1987Go) with the TN93 distance metric (Tamura and Nei 1993Go) to reconstruct the tree topology on each putative nonrecombinant fragment. The NJ method has been shown to perform reasonably well on reconstructing trees from simulated alignments, including large alignments (Tamura et al. 2004Go), and if additional accuracy is desired, a more computationally demanding method can be invoked. Branch lengths and other model parameters are fitted using the maximum likelihood framework (Felsenstein 1981Go).

Our algorithm consists of 4 steps:

  1. Infer a NJ tree for the entire alignment and obtain the AICc (A0) score for a given nucleotide model using maximum likelihood to estimate rate parameters and branch lengths. AICc score of a model with p parameters fitted to a sample of size N is defined as Formula AICc is a second-order correction to the standard AIC, and its use has been advocated when the number of samples is not much larger (40x or fewer) than the number of parameters (Burnham and Anderson 2003Go, p. 66). Note that the use of AICc sensibly requires that there be more observations (alignment columns) than the number of estimated model parameters. The formal requirement for this setting is, consequently, N > 2(2S – 3) + bp, where 2(2S – 3) counts the number of branches in 2 trees fitted by the GA and bp refers to the number of rate and frequency parameters in the evolutionary model.
    We hold the estimates of base frequencies and substitution bias parameters at the values obtained from this step. Indeed, it is reasonable to expect that the stationary base distribution and the parameters of the point substitution process on each nonrecombinant fragment are not strongly affected by recombination.
  2. We consider all V – 1 partitions of N sites into 2 continuous blocks, where each block contains at least one variable site and each break point coincides with a variable site.
  3. If the break point is placed at site i, we infer a NJ tree individually for each block and compute the AICc score (Ai) of the model that fits branch lengths to each partition independently, holding other parameters fitted in step 1 constant. The single–break point model will have 2S – 3 more estimable parameters than the single-partition model. This step is repeated for all V – 1 possible locations of the break point.
  4. If Ai < A0 for at least one i, then we deduce that some of the sequences in the alignment are recombinant. We equate the relative support for having the break point at site i to its Akaike (1983)Go weight: Formula where Formula is score for the model placing the break point at the i-th variable site, and r indexes possible locations of the break point.

Searching for Multiple Break Points Using a GA
When multiple break points are introduced, a brute-force approach rapidly becomes impractical. Even with the simplifying assumption that break points are restricted to V variable sites, there are Formula possible combinations of B ≥ 1 break points, when Formula For example, if one were to examine the entire HIV-1 genome (~10 kb) for recombination and assume that a quarter of the sites were variable, there would be approximately 109 models with 3 break points to consider.

Consequently, we utilize an aggressive population-based hill climber—the CHC GA (Eshelman 1991Go; Kosakovsky Pond and Frost 2005aGo)—to search the space of candidate models. A candidate model for B break points is represented by the ordered vector Formula where 1 ≤ v1 ≤ v2 ≤, ..., ≤ vB ≤ V represent the locations of break points in the coordinates of variable sites, ordered left to right. When 2 coordinates are equal, the model collapses to B – 1 break points. The GA operates on the binary representation of this vector, with single bits serving as units of evolution.

The parameter space for this optimization problem has 2 components: a discrete allocation of possible positions in the sequence to B break points and a vector of real valued parameters corresponding to branch lengths. The CHC algorithm is employed to search through the discrete component of the parameter space, and conventional numerical optimization techniques are used to find maximum likelihood estimates of all other model parameters, given vector b. The fitness of every model is measured by its AICc score. Individuals are chosen for mating with probabilities proportional to their fitness.

The CHC always retains the most fit individual from the previous generation and performs 2 basic operations on individuals currently in the population:

  • Mating with free recombination: When 2 individuals b1 and b2 are picked to mate, their offspring, bO, is equally likely to inherit bit bi from either parent.
  • Hypermutation: If the diversity of the sample (measured by the range of AICc scores normalized by the score of the best individual) falls below a fixed threshold—0.1% in our implementation—then all individuals in the population, excluding the most fit one, have 15% of randomly selected bits toggled.

Before the new individuals generated by these operations are placed into the population, it may be necessary to re-sort the break points in ascending order to avoid equivalent representations of the same model. The algorithm terminates if the best AICc score remains unchanged over 100 consecutive generations. To increase the proportion of all possible models examined by the GA, a master list of all fitted models is maintained, and if a previously examined model is generated, the algorithm will randomly mutate such an individual (one position at a time), until a new model has been proposed, provided there are any remaining. A typical GA run considers 103–104 models, hence it is practical to maintain such a list in memory.

For computational expediency, we make the same simplifications as in the single–break point case: NJ trees are reconstructed for each fragment and parameters of the substitution model are estimated first from the entire alignment and held constant for the entire GA run. For each data set, we start with B = 0 break points and increase B by 1 for subsequent GA runs, until the AICc score of the best model stops decreasing with increasing B. We note that such incremental changing of B may underestimate the correct number of break points. A more careful search procedure might investigate a fixed range of B (e.g., B = 1, ..., 20) but incur greater computational costs. Ideally, B would also be a parameter to be determined automatically during the search, but it is challenging to implement a GA that can properly search a parameter space whose dimension may change at run time.

Model-Averaged Break Point Locations
Having fitted M models with B break points each using the GA and computed their corresponding Akaike weights, wi, where i = 1, ..., M, we can compute model-averaged support probability Formula that the j-th break point rests on nucleotide position n in the alignment Formula Here Formula denotes the set of models which place their j-th break point (ordered by increasing nucleotide position of the break point) on site n. It is easy to see that Formula for all 1 ≤ j ≤ B.

Result Verification
GARD does not explicitly require that tree topologies be different among partitions. For example, if the alignment exhibits strong spatially localized changes in diversity or heterotachy, then a model that fits the same topology but differing branch lengths to segments of the alignment might outperform a single-partition model. Optionally, to verify whether the "topologies" were significantly different between adjacent partitions, we performed a posteriori incongruence tests between all the tree topologies derived from adjacent sequence segments. We used the Shimodaira and Hasegawa (1999)Go test (SH test) and required that at least 1 pair of the adjacent segments show a statistically significant (P < 0.01, when corrected for multiple tests) difference in tree topologies. However, this requirement may be too restrictive because not all recombination events give rise to discordant topologies between sequence fragments (only Type 3 recombination events do, using the notation of Wiuf et al. 2001Go). When only the improvement in the AICc score is used to detect recombination, other types of events may be identified, and 3-sequence alignments, for which there is a single possible tree topology, can also be handled.

Implementation
All sequence analyses and model fitting were performed using the HyPhy (Kosakovsky Pond et al. 2005Go) software on a P-node message passing interface cluster. P – 1 slave nodes were used to fit various models, and a single master node dispatched the jobs and assembled the results. The size of CHC population was set to 2P – 2 individuals. We set P = 17 for the analyses in this article. A single run of the GA algorithm required from several minutes to several hours, based on the size of the alignment and the number of break points. A Web-based interface for GARD is available at http://www.datamonkey.org/GARD/.

Sequence Alignments
We consider 2 collections of alignments previously analyzed for recombination: 24 mixed data sets from Posada (2002)Go and 78 viral data sets from Chare et al. (2003)Go. These alignments span a range of size and diversity levels and include a large number of both recombinant and nonrecombinant cases. Furthermore, comparing the performance of the new method to existing ones on previously analyzed alignments provides a direct measure of agreement with the tools currently at the disposal of researchers and helps elucidate conditions under which our approach reaches a different conclusion. Both biological and simulated sequence alignments used in this study can be downloaded from http://www.hyphy.org/pubs/GARD/.

Simulations
Scenario 1 (fig. 1) consisted of 8 nonrecombinant sequences and a single recombinant with 2 break points. The length of the sequences, divergence levels, and base frequencies were derived from HIV-1 subtype B/D recombinants. Data were generated parametrically using the HKY85 (Hasegawa et al. 1985Go) model of sequence evolution with constant rates across sites and the transition/transversion ratio set to 3. Scenario 2 (fig. S1, Supplementary Material online) extended Scenario 1 to 3 recombination break points involving a clade (ancient recombination) and a single sequence (recent recombination). These 2 scenarios are able to measure the performance of the method over multiple replicates of the same evolutionary process with fixed break points and recombinant sequences.


Figure 1
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Performance of the GARD in detecting recombination for Scenario 1. The trees and sequence fragments used to simulate the data are shown, and recombinant sequences are highlighted. The summary table classifies the results of recombination inference based on 100 data replicates. Probability plots display model-averaged probabilities of finding a break point at a given position in the alignment, averaged either over those runs that detected some recombination (95 runs, top) or over the runs that correctly identified 2 break points (69 runs bottom).

 
Additionally, we utilized 8 coalescent-based recombination simulations designed to sample from multiple realizations of the evolutionary process with fixed metaparameters, such as the number of recombination events and mutation rates. Hundred alignments with 8 sequences with 3,000 nt each were simulated for 2 levels of diversity: {approx}5% (low) and {approx}25% (high), selected to reflect the level of divergence within—and between—subtype HIV-1 sequences. Each alignment contained 0, 1, 2, 4, or 8 recombination events. General Time Reversible (GTR) + {Gamma} ({alpha} = 0.5) model was used to describe the process of character substitution. Base frequencies ({pi}A = 0.35, {pi}C = 0.19, {pi}G = 0.22, and {pi}T = 0.24) and substitution rates (rAC = 2, rAG = 5, rAT = 0.7, rCG = 0.8, rCT = 4, and rGT = 1) were chosen to reflect parameters similar to those found in HIV-1 sequences.

Finally, we considered a simulation scenario (Neutral Scenario), in which 32-codon sequences were evolved using 2 randomly generated trees (one for 400 codons and another for 100 codons) and then concatenated. Hundred replicates using 2 fixed trees (one on each partition) were generated. This rather extreme scenario was chosen to model a fixed recombination hot spot that sustains high recombination rates. Phylogenetic signals to the left and the right of the hot spot were effectively independent of one another, but there was strong consistent phylogenetic signal within each region. Each fragment was evolved under a neutral (dN = dS = 1) MG94 x REV codon model (Kosakovsky Pond and Frost 2005cGo) estimated from an alignment of HIV-1 reverse transcriptase sequences. The concatenated alignment was next analyzed for site-by-site selection with FEL (Kosakovsky Pond and Frost 2005bGo) based on the NJ tree inferred from the entire alignment. Following a recombination screen with the single–break point model, we next applied FEL to each of the inferred nonrecombinant fragments with NJ trees derived from each segment independently. Additionally, we performed FEL analyses on biological data sets showing strong evidence for recombination, with and without splitting the alignments into nonrecombinant fragments to explore how recombination can affect the inference of codons under selection.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Single–Break Point Screen for Recombination
Using the single–break point screening procedure (see Materials and Methods), we reanalyzed nucleotide data (10 sequences with 1,000 bp) simulated under varying levels of divergence and recombination originally presented in Posada and Crandall (2001)Go. The results for detecting whether recombination has acted on an alignment are summarized in figure 2. Single–break point scanning outperforms all the 14 methods presented in the original study and a more recent composite likelihood permutation test (McVean et al. 2002Go), both in terms of false positives and power, except for the cases of low sequence divergence, when the performance is comparable to the best of the other methods (fig. 2C). This finding is quite remarkable because the assumption of a single break point is likely violated for a vast majority of simulations (fig. 2A), where recombination did occur. When the recombination rate is high ({rho} = 64), there are over 180 recombination events per alignment, on average, and one could expect that all phylogenetic signal is lost. Nonetheless, our method can reliably (>95%) detect recombination even if the level of divergence is low. Because we explicitly model site-to-site rate variation, it is not surprising that the rate of false positives for nonrecombinant data simulated with variable rates is low (fig. 2B). The software implementation of our method runs very quickly—for example, 100 alignments with 10 sequences, 1,000 bp long, were screened in about 5 min on 24 cluster nodes. Hence, our approach can be recommended as a rapid recombination-screening tool.


Figure 2
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Single–break point method performance on simulated data from Posada and Crandall (2001)Go. Panels A and B show the proportion of 100 data replicates that the test classified as recombinant. "{theta}" denotes the scaled mutation rate. In panel A, {rho} denotes the scaled recombination rate and R(n), the expected number of recombination events per replicate. In panel B, alignments were simulated without recombination but with gamma-distributed site-to-site substitution rate variation (mean 1, variance of 1/{alpha}). These plots are directly comparable with figure 1 in Posada and Crandall (2001)Go. Panel C shows the number of data sets (out of 100) that were classified as recombinant by: the best performing method of the 14 examined by Posada and Crandall (2001)Go; the composite likelihood permutation test (McVean et al. 2002Go), using benchmark results from Carvajal-Rodriguez et al. (forthcoming) based on the Jukes–Cantor model (Jukes and Cantor 1969Go) of nucleotide substitution and either all sites (JCall) or only those with exactly 2 alleles (JC2); and our single–break point recombination scan.

 
Biological Data Analyses
Table 1 and table S1 (Supplementary Material online) summarize GARD results based on a collection of biological sequence data. For the data sets from Posada (2002)Go, we find high levels of agreements between our results and those achieved with the 50% consensus of 14 recombination detection methods. When recombination is detected, there is invariably a strong level of statistical support for multiple segments, both in terms of goodness of fit and phylogenetic discordance between adjacent partitions. GARD found between 1 and 9 recombination break points, with a wide spectrum of lengths of nonrecombinant fragments. It is worth noting that, in most cases, recombination events appear to follow a complex pattern involving multiple sequences, as evidenced by the low proportion of phylogenetic splits shared among the trees derived from each fragment.


View this table:
[in this window]
[in a new window]

 
Table 1 GARD Results Based on the Alignments from Posada (2002)Go

 
GARD reconfirmed all 5 genes found to have undergone recombination by the phylogenetic incongruence test in Chare et al. (2003)Go, with very similar locations of break points (Table S1, Supplementary Material online). Two of these genes were found to contain multiple break points. Interestingly, our approach found 11 additional genes with putative recombinant sequences, both based on AICc goodness of fit and the SH test for phylogenetic incongruence. Thirteen other genes were classified as recombinant by AICc alone, indicating the possibility of Type 1 or 2 recombination events or perhaps the inadequacy of the model for character substitution but did not have strong evidence of phylogenetic discordance. This suggests that another process (such as space-localized selection or substitution rate variation) could be affecting branch lengths of the phylogeny along the sequence. Some of the phylogenetic incongruence signal (e.g., Measles M gene) is not likely to be a result of recombination but rather the effect of adenosine to inosine hypermutation events from cases of subacute sclerosing panencephalitis, which result in phylogenetic patterns resembling convergent evolution (Woelk et al. 2002Go). GARD does not rely on manual identification of sequences "migrating" along the tree between different sequence fragments, as does the Chare et al. (2003)Go method, thus it is not surprising that it appears to be more sensitive. Indeed, all genes with evidence of recombination as detected by at least two of the three methods used by Chare et al. (2003)Go are also detected by our approach.

Simulated Data Analyses
When the model used to simulate recombinant sequences matches the underlying assumptions (i.e., there is phylogenetic incongruence between 2 or more sequence fragments, but the evolutionary process is the same for the entire sequence) of our detection methods, as is the case for simulation Scenarios 1 and 2, GARD performed well both in detecting the number of break points and their location in the sequence (fig. 1 and fig. S1, Supplementary Material online). Recombination was detected reliably (95/100 and 100/100 cases, respectively). The correct number of break points was inferred in 69/100 and 82/100 cases. GARD had a slight tendency to overcount the number of break points (26 times for Scenario 1 and 11 times for Scenario 2). When break point counts were inferred correctly, their location was found reasonably accurately, even though confidence intervals (CIs) for the locations were fairly wide (fig. 1 and fig. S1, Supplementary Material online).

To quantify the performance GARD when the underlying evolutionary model may be different from the one assumed by the method, we evaluated 800 coalescent-based simulated data sets. As expected, the ability to detect recombination somewhere in the sequence increases both with the level of divergence and the extent of recombination (table 2), with near-perfect power for alignments with 8 recombination events. However, recombination signal is quickly saturated for small alignments (8 sequences), and the number of break points is often underestimated. Interestingly, this limitation may be due not to the GA search procedure but rather to our limited ability to infer phylogenetic trees from short fragments of small alignments. If we were to use the correct placement of recombination break points and perform the AICc or the SH tests as discussed in Materials and Methods, only a small percentage of alignments would contain evidence of recombination (see table 2). The finding is especially striking for scenarios with 8 recombination events per alignment, where not a single replicate contained enough information to statistically support discordant phylogenies. The inability to reliably resolve phylogenies is clearly a fundamental limitation of all tests based on phylogenetic discordance rather than that of GARD alone.


View this table:
[in this window]
[in a new window]

 
Table 2 Recombination Inference Results Based on Simulation Scenarios 3 (low diversity) and 4 (high diversity)

 
If one is concerned with inferring the location of recombination break points, then the situation is less clear (table 3). The best possible outcome for GARD is to place each inferred break point at a variable site that is nearest to a true recombination break point. This happens about 20% of the time (table 3). More realistically, the model-averaged CI for the location of a given inferred break point should contain at least 1 recombination break point. This is the case about 60% of the time. Overall, 60–70% of inferred break points have a true break point in the 95% CI. Another measure of inference quality is the median distance from each inferred break point to the nearest "true" break point (or more accurately, to the nearest variable site closest to the "true" break point). This quantity, predictably, decreases with the increasing number of break points and level of sequence divergence (table 3). The distribution of distances resembles an exponential form (fig. S2, Supplementary Material online). To quantify the statistical significance of median distances derived with our algorithm, we conducted a simple simulation. We computed median distances to correct break points based on 1,000 random placements of break points in all 100 replicates in every scenario. Random break points were placed on variable sites only, and the number of break points allocated to a replicate was randomly drawn from the distribution of the number of inferred break points for that scenario. P values of observing smaller median distances to correct break points by chance were computed based on 1,000 replicates. In all cases, the median distance from inferred break points to correct ones was significantly less than that expected by chance.


View this table:
[in this window]
[in a new window]

 
Table 3 Quality of Break Point Location Inference Based on Simulation Scenarios 3 (low diversity) and 4 (high diversity)

 
Effect of Recombination on Site-by-Site Analyses of Selection
False-positive rates of the FEL method were adversely affected (fig. 3) by extensive recombination in the Neutral Scenario (see Materials and Methods). Generally, FEL is expected to have well-controlled rates of false positives, relative to the P value of the test (Kosakovsky Pond and Frost 2005bGo), but in this case, inference on the last 100 codons of each of the alignments was subject to Type I error far in excess of the nominal P value (fig. 3A), whereas the error rate for the first 400 codons was effectively the same as the P value. Intuitively, a topology inferred from all 500 codons is "almost" correct for the first 400 codons and "never" correct for the last 100 codons. A simple corrective procedure, in which we split each of the 100 simulated alignments into 2 fragments, identified by the single–break point scan on that alignment, and then analyzed each segment separately with FEL, restored good statistical properties of FEL (fig. 3B).


Figure 3
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— False-positive error rates for the FEL test for selected (both positively and negatively) sites based under the Neutral Scenario. Panel A shows the error rates for the uncorrected (single partition) FEL and panel B, for the corrected (2 partitions) FEL. Solid lines indicate expected error rates, based on the P value. Tabulated error rates are presented for the first 400 codons (evolved under one tree), the last 100 codons (evolved under a different tree), and the joint error rate for all 500 codons, averaged over 100 replicates.

 
We also analyzed 14 data sets (see Table S1, Supplementary Material online) that showed both AICc and SH support for recombination, using FEL, with and without splitting the alignment into nonrecombinant fragments. The list of sites subject to positive selection can vary substantially between corrected and uncorrected FEL analyses (table 4). This observation reinforces previous findings (Anisimova et al. 2003Go; Shriner et al. 2003Go) indicating that recombination can significantly alter the results of selection analyses.


View this table:
[in this window]
[in a new window]

 
Table 4 Effect of Correcting for Recombination When Using FEL to Detect Positively Selected Sites

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Recombination is a major evolutionary force in many organisms and can have a profound impact on evolutionary rates. Not only is recombination of interest in its own right, but also analyses of selection pressure may be confounded by its presence or absence. Maximum likelihood methods of codon substitution, used to estimate selection pressures on sites in terms of the ratio of nonsynonymous to synonymous substitution rates, may generate many false-positive sites when recombination is not taken into account. Results based on Poisson random field models (Sawyer and Hartl 1992Go), used to infer selection pressures from the frequency spectrum of sites under the assumption of loose linkage between sites, may be misleading in the absence of recombination. Hence, screening of recombination should be an integral part of phylogenetic analyses.

The use of phylogenetic incongruence among fragments of a sequence alignment to detect recombination is not a new approach (Koop et al. 1989Go; Fitch and Goodman 1991Go; Salminen et al. 1996Go; Grassly and Holmes 1997Go; McGuire and Wright 1998Go). Many of the existing methods (Holmes et al. 1999Go; Archibald and Roger 2002Go) rely on a sliding window approach. However, the length of the sliding window and the way it is moved along the sequence can strongly influence recombination inference. Several methods based on Markov Chain Monte Carlo (Husmeier and McGuire 2002Go; Suchard et al. 2002Go; Minin et al. 2005Go) are free of the sliding window limitations, but due to computational expense, they can only be used to examine small or medium alignments. In contrast, GARD is very intuitive, simple to implement and extend, and runs quickly on a computer cluster. Most importantly, it works very well in multiple scenarios, yielding good power and low rates of false positives.

It is remarkable that even our single–break point method outperforms almost all existing methods when detecting the presence of recombination, even when the true number of break points is extremely high. Given the speed of this approach, it can be recommended if one is only interested in screening for the presence or absence of recombination. The one method (Maynard Smith and Smith 1998Go) that performs better under some parameter regimes is not robust to rate heterogeneity. Given that it is difficult to tease apart recombination and rate heterogeneity, robustness of results is an important consideration in choosing a method to detect recombination. Our multiple break point model generates a rich set of inferences, on the number and location of break points, sequences involved in the recombination events, and the confidence in these inferences. We have also demonstrated that in certain situations, a simple screen for recombination can be used to correct analyses that detect sites evolving adaptively and to mitigate high rates of false positives incurred by uncorrected analyses of recombinant sequences for selection.

Our method has a number of limitations. Sometimes discordant phylogenetic signal may arise not through recombination but through a region evolving under a different evolutionary model. This may be the case for measles virus, which is thought to be recombination free. Despite our best attempts to automate the process of recombination screening as much as possible, we stress the importance of analyzing the resultant trees for the different segments and making an informed judgment about whether the results make sense or not. Optimistically, we note that GARD can easily accommodate substitution models of arbitrary complexity, and as methodological developments occur in this area, the performance of our approach may also improve. Like all methods, ours cannot detect recombination in regions where there is no genetic diversity.

In conclusion, we have developed a straightforward method for detecting discordant phylogenetic signal in alignments of DNA or protein sequences, which provides estimates of the number and location of break points and segment-specific phylogenetic trees. GARD does not require a nonrecombinant reference alignment (cf. bootscanning, see Salminen et al. 1995Go; Lole et al. 1999Go), and recombination between ancestral sequences is also accommodated. GARD outperforms other methods in terms of levels of Type I and Type II error (fig. 1) and can employ arbitrarily complex models of substitution. Furthermore, it can be run in parallel on a cluster of computers, and so is well positioned to screen for recombination in large data sets. We hope that this will encourage researchers to make recombination screening a routine part of their evolutionary analyses.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary Table S1 and Figures S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
This research was supported in part by the National Institutes of Health (AI43638, AI47745, and AI57167), the University of California Universitywide AIDS Research Program (grant number IS02-SD-701), and by a University of California, San Diego Center for AIDS Research/NIAID Developmental Award to S.D.W.F. and S.L.K.P (AI36214). D.P. was supported by grant R01-GM66276 from the US National Institutes of Health, grant BFU2004-02700 of the Spanish Ministry of Education and Science, and the "Ramón y Cajal" programme of the Spanish government. C.H.W. was further assisted by the Veterans Affairs Research Center for HIV and Hepatatis C Virus Infection.

Funding to pay the Open Access publication charges for this article was provided by the National Institutes of Health grant no. AI57167.


    Footnotes
 
Arndt von Haesler, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Akaike H. 1983. Information measures and model selection. Int Stat Inst 44:139–49.

    Anisimova M, Nielsen R, Yang Z. 2003. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics 164:1229–36.[Abstract/Free Full Text]

    Archibald JM, Roger AJ. 2002. Gene conversion and the evolution of euryarchael chaperonins: a maximum likelihood-based method for detecting conflicting phylogenetic signals. J Mol Evol 55:232–45.[CrossRef][Web of Science][Medline]

    Awadalla P. 2003. The evolutionary genomics of pathogen recombination. Nat Rev Genet 4:50–60.[CrossRef][Web of Science][Medline]

    Burnham K, Anderson D. 2003. Model selection and multimodel inference. 2nd ed. New York: Springer.

    Carvajal-Rodriguez A, Crandall KA, Posada D. 2006. Recombination estimation under complex evolutionary models with the coalescent composite likelihood method. Mol Biol Evol. Forthcoming.

    Chare ER, Gould EA, Holmes EC. 2003. Phylogenetic analysis reveals a low rate of homologous recombination in negative-sense RNA viruses. J Gen Virol 84:2691–703.[Abstract/Free Full Text]

    Eshelman LJ. 1991. The CHC adaptive search algorithm: how to do safe search when engaging in nontraditional genetic recombination. In: Rawlins GJE, editor. Foundations of genetic algorithms. San Mateo, CA: Morgan Kaufmann Publishers. p 265–83.

    Felsenstein J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–76.[CrossRef][Web of Science][Medline]

    Felsenstein J, Churchill GA. 1996. A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol 13:93–104.[Abstract]

    Fitch D, Goodman M. 1991. Phylogenetic scanning: a computer assisted algorithm for mapping gene conversion and the recombination events. Comput Appl Biosci 7:207–15.[Abstract/Free Full Text]

    Grassly N, Holmes E. 1997. A likelihood method for the detection of selection and recombination using nucleotide sequences. Mol Biol Evol 14:239–47.[Abstract]

    Hasegawa M, Kishino H, Yano T. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Mol Biol Evol 21:160–74.

    Holmes EC, Worobey M, Rambaut A. 1999. Phylogenetic evidence for recombination in dengue virus. Mol Biol Evol 16:405–9.[Abstract]

    Husmeier D, McGuire G. 2002. Detecting recombination with MCMC. Bioinformatics 18:S345–53.[Abstract]

    Jukes TH, Cantor CR. 1969. Evolution of protein molecules. In: Munro HM, editor. Mammalian protein metabolism. New York: Academic Press. p 21–132.

    Koop B, Siemieniak D, Slightom J, Goodman M, Dunbar J, Wright P, Simons E. 1989. Tarsius delta- and beta-globin genes: conversions, evolution, and systematic implications. J Biol Chem 264:68–79.[Abstract/Free Full Text]

    Korber B, Muldoon M, Theiler J, Gao F, Gupta R, Lapedes A, Hahn BH, Wolinsky S, Bhattacharya T. 2000. Timing the ancestor of the HIV-1 pandemic strains. Science 288:1789–96.[Abstract/Free Full Text]

    Kosakovsky Pond SL, Frost SD. 2005a. A genetic algorithm approach to detecting lineage-specific variation in selection pressure. Mol Biol Evol 22:478–85.[Abstract/Free Full Text]

    Kosakovsky Pond SL, Frost SD. 2005b. Not so different after all: a comparison of methods for detecting amino-acid sites under selection. Mol Biol Evol 22:1208–22.[Abstract/Free Full Text]

    Kosakovsky Pond SL, Frost SD. 2005c. A simple hierarchical approach to modeling distributions of substitution rates. Mol Biol Evol 22:223–34.[Abstract/Free Full Text]

    Kosakovsky Pond SL, Frost SDW, Muse SV. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21:676–9.[Abstract/Free Full Text]

    Lole KS, Bollinger RC, Paranjape RS, Gadkari D, Kulkarni SS, Novak NG, Ingersoll R, Sheppard HW, Ray SC. 1999. Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J Virol 73:152–60.[Abstract/Free Full Text]

    Maynard Smith J, Smith N. 1998. Detecting recombination from gene trees. Mol Biol Evol 15:590–9.[Abstract]

    McGuire G, Wright F. 1998. TOPAL: recombination detection in DNA and protein sequences. Bioinformatics 14:219–20.[Abstract/Free Full Text]

    McVean G, Awadalla P, Fearnhead P. 2002. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160:1231–41.[Abstract/Free Full Text]

    Minin VN, Dorman KS, Fang F, Suchard MA. 2005. Dual multiple change-point model leads to more accurate recombination detection. Bioinformatics 21:3034–42.[Abstract/Free Full Text]

    Muse SV, Gaut BS. 1994. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol 11:715–24.[Abstract]

    Nielsen R, Yang ZH. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929–36.[Abstract/Free Full Text]

    Posada D. 2002. Evaluation of methods for detecting recombination from DNA sequences: empirical data. Mol Biol Evol 19:708–17.[Abstract/Free Full Text]

    Posada D, Crandall KA. 2001. Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci 98:13757–62.[Abstract/Free Full Text]

    Posada D, Crandall KA. 2002. The effect of recombination on the accuracy of phylogeny estimation. J Mol Evol 54:396–402.[Web of Science][Medline]

    Saitou N, Nei M. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–25.[Abstract]

    Salminen M, Carr J, Burke D, McCutchan F. 1995. Identification of breakpoints in intergenotypic recombinants of HIV type 1 by bootscanning. AIDS Res Hum Retrovir 11:1423–5.[Web of Science][Medline]

    Salminen M, Carr J, Burke D, McCutchan F. 1996. Identification of breakpoints in intergenotypic recombinants of HIV-1 by bootscanning. AIDS Res Hum Retrovir 11:1423–5.

    Savill NJ, Hoyle DC, Higgs PG. 2001. RNA sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum-likelihood methods. Genetics 157:399–411.[Abstract/Free Full Text]

    Sawyer SA, Hartl DL. 1992. Population genetics of polymorphism and divergence. Genetics 132:1161–76.[Abstract]

    Schierup M, Hein J. 2000a. Consequences of recombination on traditional phylogenetic analysis. Genetics 156:879–91.[Abstract/Free Full Text]

    Schierup M, Hein J. 2000b. Recombination and the molecular clock. Mol Biol Evol 17:1578–9.[Free Full Text]

    Shimodaira H, Hasegawa M. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16:1114–6.[Web of Science]

    Shriner D, Nickle DC, Jensen MA, Mullins J. 2003. Potential impact of recombination on sitewise approaches for detecting positive natural selection. Genet Res 81:115–21.[CrossRef][Web of Science][Medline]

    Suchard MA, Weiss RE, Dorman KS, Sinsheimer JS. 2002. Oh brother, where art thou? a Bayes factor test for recombination with uncertain heritage. Syst Biol 51:715–28.[CrossRef][Web of Science][Medline]

    Sugiura N. 1978. Further analysis of the data by Akaike's information criterion and the finite corrections. Commun Stat Theory Meth A7:13–26.

    Suzuki Y, Gojobori T. 1999. A method for detecting positive selection at single amino acid sites. Mol Biol Evol 16:1315–28.[Abstract]

    Tamura K, Nei M. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10:512–26.[Abstract]

    Tamura K, Nei M, Kumar S. 2004. Prospects for inferring very large phylogenies by using the neighbor-joining method. Proc Natl Acad Sci 101:11030–5.[Abstract/Free Full Text]

    Tavaré S. 1986. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect Math Life Sci 17:57–86.

    Tsaousis AD, Martin DP, Ladoukakis ED, Posada D, Zouros E. 2005. Widespread recombination in published animal mtDNA sequences. Mol Biol Evol 22:925–33.[Abstract/Free Full Text]

    Wiuf C, Christensen T, Hein J. 2001. A simulation study of the reliability of recombination detection methods. Mol Biol Evol 18:1929–39.[Abstract/Free Full Text]

    Woelk C, Pybus O, Li J, Brown D, Holmes E. 2002. Increased positive selection pressure in persistent (SSPE) versus acute measles virus infections. J Gen Virol 83:1419–30.[Abstract/Free Full Text]

    Worobey M, Holmes EC. 1999. Evolutionary aspects of recombination in RNA viruses. J Gen Virol 80:2535–43.[Free Full Text]

    Zhuang J, Jetzt AE, Sun G, Yu H, Klarmann G, Ron Y, Preston BD, Dougherty JP. 2002. Human immunodeficiency virus type 1 recombination: rate, fidelity, and putative hot spots. J Virol 76:11273–82.[Abstract/Free Full Text]

Accepted for publication June 27, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GeneticsHome page
M. Arenas and D. Posada
Coalescent Simulation of Intracodon Recombination
Genetics, February 1, 2010; 184(2): 429 - 437.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
S. S. Steiger, A. E. Fidler, J. C. Mueller, and B. Kempenaers
Evidence for Adaptive Evolution of Olfactory Receptor Genes in 9 Bird Species
J. Hered., December 4, 2009; (2009) esp105v1.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
A. C. Springman, D. W. Lacher, G. Wu, N. Milton, T. S. Whittam, H. D. Davies, and S. D. Manning
Selection, Recombination, and Virulence Gene Diversity among Group B Streptococcal Genotypes
J. Bacteriol., September 1, 2009; 191(17): 5419 - 5427.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
B. Frajman, F. Eggens, and B. Oxelman
Hybrid Origins and Homoploid Reticulate Evolution within Heliosperma (Sileneae, Caryophyllaceae)--A Multigene Phylogenetic Approach with Relative Dating
Syst Biol, July 3, 2009; (2009) syp030v1.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
F. J. Stewart, C. R. Young, and C. M. Cavanaugh
Evidence for Homologous Recombination in Intracellular Chemosynthetic Clam Symbionts
Mol. Biol. Evol., June 1, 2009; 26(6): 1391 - 1404.
[Abstract] [Full Text] [PDF]


Home page
JEMHome page
B. F. Keele, H. Li, G. H. Learn, P. Hraber, E. E. Giorgi, T. Grayson, C. Sun, Y. Chen, W. W. Yeh, N. L. Letvin, et al.
Low-dose rectal inoculation of rhesus macaques by SIVsmE660 or SIVmac251 recapitulates human mucosal infection by HIV-1
J. Exp. Med., May 11, 2009; 206(5): 1117 - 1134.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
M.-R. Abrahams, J. A. Anderson, E. E. Giorgi, C. Seoighe, K. Mlisana, L.-H. Ping, G. S. Athreya, F. K. Treurnicht, B. F. Keele, N. Wood, et al.
Quantitating the Multiplicity of Infection with Human Immunodeficiency Virus Type 1 Subtype C Reveals a Non-Poisson Distribution of Transmitted Variants
J. Virol., April 15, 2009; 83(8): 3556 - 3567.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
H. C. Zhu, D. K. W. Chu, W. Liu, B. Q. Dong, S. Y. Zhang, J. X. Zhang, L. F. Li, D. Vijaykrishna, G. J. D. Smith, H. L. Chen, et al.
Detection of diverse astroviruses from bats in China
J. Gen. Virol., April 1, 2009; 90(4): 883 - 887.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
J. Zoll, J. M. D. Galama, and F. J. M. van Kuppeveld
Identification of Potential Recombination Breakpoints in Human Parechoviruses
J. Virol., April 1, 2009; 83(7): 3379 - 3383.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
R. A. Medina, F. Torres-Perez, H. Galeno, M. Navarrete, P. A. Vial, R. E. Palma, M. Ferres, J. A. Cook, and B. Hjelle
Ecology, Genetic Diversity, and Phylogeographic Structure of Andes Virus in Humans and Rodents in Chile
J. Virol., March 15, 2009; 83(6): 2446 - 2459.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Anisimova and C. Kosiol
Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models
Mol. Biol. Evol., February 1, 2009; 26(2): 255 - 271.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
W. Delport, K. Scheffler, and C. Seoighe
Models of coding sequence evolution
Brief Bioinform, January 1, 2009; 10(1): 97 - 109.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C.-Q. He, Z.-X. Xie, G.-Z. Han, J.-B. Dong, D. Wang, J.-B. Liu, L.-Y. Ma, X.-F. Tang, X.-P. Liu, Y.-S. Pang, et al.
Homologous Recombination as an Evolutionary Force in the Avian Influenza A Virus
Mol. Biol. Evol., January 1, 2009; 26(1): 177 - 187.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. L. Kosakovsky Pond, A. F.Y. Poon, A. J. Leigh Brown, and S. D.W. Frost
A Maximum Likelihood Method for Detecting Directional Evolution in Protein Sequences and Its Application to Influenza A Virus
Mol. Biol. Evol., September 1, 2008; 25(9): 1809 - 1824.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. Zhao, F. Zhao, T. Li, and D. A. Bryant
A new pheromone trail-based genetic algorithm for comparative genome assembly
Nucleic Acids Res., June 1, 2008; 36(10): 3455 - 3462.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
E. Strain, L. A. Kelley, S. Schultz-Cherry, S. V. Muse, and M. D. Koci
Genomic Analysis of Closely Related Astroviruses
J. Virol., May 15, 2008; 82(10): 5099 - 5103.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
S. Yan, H. Liu, T. J. Mohr, J. Jenrette, R. Chiodini, M. Zaccardelli, J. C. Setubal, and B. A. Vinatzer
Role of Recombination in the Evolution of the Model Plant Pathogen Pseudomonas syringae pv. tomato DC3000, a Very Atypical Tomato Strain
Appl. Envir. Microbiol., May 15, 2008; 74(10): 3171 - 3181.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
G. W. Moy, S. A. Springer, S. L. Adams, W. J. Swanson, and V. D. Vacquier
Extraordinary intraspecific diversity in oyster sperm bindin
PNAS, February 12, 2008; 105(6): 1993 - 1998.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
L. A. Shackelton, K. Hoelzer, C. R. Parrish, and E. C. Holmes
Comparative analysis reveals frequent recombination in the parvoviruses
J. Gen. Virol., December 1, 2007; 88(12): 3294 - 3301.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. Buendia and G. Narasimhan
Sliding MinPD: building evolutionary networks of serial samples via an automated recombination detection approach
Bioinformatics, November 15, 2007; 23(22): 2993 - 3000.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
B. Q. Dong, W. Liu, X. H. Fan, D. Vijaykrishna, X. C. Tang, F. Gao, L. F. Li, G. J. Li, J. X. Zhang, L. Q. Yang, et al.
Detection of a Novel and Highly Divergent Coronavirus from Asian Leopard Cats and Chinese Ferret Badgers in Southern China
J. Virol., July 1, 2007; 81(13): 6920 - 6926.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
D. Vijaykrishna, G. J. D. Smith, J. X. Zhang, J. S. M. Peiris, H. Chen, and Y. Guan
Evolutionary Insights into the Ecology of Coronaviruses
J. Virol., April 15, 2007; 81(8): 4012 - 4020.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
A. Valli, J. J. Lopez-Moya, and J. A. Garcia
Recombination and gene duplication in the evolutionary diversification of P1 proteins in the family Potyviridae
J. Gen. Virol., March 1, 2007; 88(3): 1016 - 1028.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. L. Kosakovsky Pond, D. Posada, M. B. Gravenor, C. H. Woelk, and S. D.W. Frost
GARD: a genetic algorithm for recombination detection
Bioinformatics, December 15, 2006; 22(24): 3096 - 3098.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrowOA All Versions of this Article:
23/10/1891    most recent
msl051v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Kosakovsky Pond, S. L.
Right arrow Articles by Frost, S. D. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kosakovsky Pond, S. L.
Right arrow Articles by Frost, S. D. W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?