Skip Navigation


MBE Advance Access originally published online on October 19, 2006
Molecular Biology and Evolution 2007 24(1):288-293; doi:10.1093/molbev/msl153
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
24/1/288    most recent
msl153v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Casanellas, M
Right arrow Articles by Fernández-Sánchez, J
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Casanellas, M
Right arrow Articles by Fernández-Sánchez, J
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Performance of a New Invariants Method on Homogeneous and Nonhomogeneous Quartet Trees

M Casanellas and J Fernández-Sánchez

Department of Matemàtica Aplicada I, Universitat Politècnica de Catalunya, Barcelona, Spain

E-mail: marta.casanellas{at}upc.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Results
 Methods
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
An attempt to use phylogenetic invariants for tree reconstruction was made at the end of the 80s and the beginning of the 90s by several researchers (the initial idea due to Lake [1987] and Cavender and Felsenstein [1987]). However, the efficiency of methods based on invariants is still in doubt (Huelsenbeck 1995; Jin and Nei 1990). Probably because these methods only used few generators of the set of phylogenetic invariants. The method studied in this paper was first introduced in Casanellas et al. (2005) and it is the first method based on invariants that uses the "whole" set of generators for DNA data. The simulation studies performed in this paper prove that it is a very competitive and highly efficient phylogenetic reconstruction method, especially for nonhomogeneous models on phylogenetic trees.

Key Words: invariants • phylogenetic inference • molecular evolution • Markov process • phylogenetic tree • nonhomogenous


    Introduction
 TOP
 Abstract
 Introduction
 Results
 Methods
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Since the introduction of phylogenetic invariants by Cavender and Felsenstein (1987)Go, Lake (1987)Go, and Evans and Speed (1993)Go, several attempts to give a generating set of polynomial phylogenetic invariants have been made (see, e.g., Steel et al. 1993Go; Ferretti and Sankoff 1995Go), but it has not been until recently that algebraic geometers have managed to find them all (Allman and Rhodes 2004aGo; Casanellas and Sullivant 2005Go; Sturmfels and Sullivant 2005Go). Methods based on invariants have already proved to be useful in comparative genomics (Sankoff and Blanchette 1999Go). However, a perception seems to have developed that invariants are inefficient, in the technical sense of requiring long sequences to correctly infer a phylogenetic tree, compare ( Jin and Nei 1990)Go and (Huelsenbeck 1995)Go. Although Huelsenbeck (1995)Go showed the inefficiency for the use of Lake's invariants alone, no method using all invariants had been proposed at that point, and the question for invariants-based methods in general was never investigated. Note that Lake's method of invariants only used 2 phylogenetic invariants of degree one among the 795 generators of the set of polynomial invariants of a quartet tree for the Kimura 2-parameter model (Garcia and Porter 2005Go). But as Felsenstein explained, invariants are worth more attention for "what they might lead to in the future" (Felsenstein 2003Go). This future may be soon, because the studies of this paper show a method based on invariants, which is indeed promising. Recently, other methods based on a large set of invariants have also been considered (Eriksson 2005Go; Kim et al. 2006Go).

Phylogenetic invariants are relationships satisfied by the expected pattern frequencies occurring in sequences evolving along a given tree topology T under an evolutionary model. More precisely, if t is the set of d model parameters on T and p{alpha}(t) is the probability of observing the pattern {alpha} at the leaves of T, by letting t vary on an open subset of Rd, the probability vector p(t) = (pAA...A(t), ..., pTT...T(t)) defines a subset ST of dimension ≤d of R4n. A "phylogenetic invariant" is a real-valued continuous function f(x) on R4n such that f(p) = 0 for any p isin ST, but not for all the points on the subset ST' determined by another tree topology T'. Essentially, the equations f(p) = 0 are satisfied for pattern frequencies arising from any model parameters on a fixed tree, so they might be used for recovering the tree topology.

In practice, the vector of observed pattern frequencies Formula obtained from an alignment of sequences for n taxa with enough data should approximate p(t) for some set of parameters t on a tree topology T. In other words, Formula should be a point close to the subset ST, so if f is an invariant for the topology T, one should have f(Formula) very close to 0. As the tree topology is identifiable via invariant-based methods (Allman and Rhodes 2006Go), using phylogenetic invariants for tree reconstruction is a consistent method (see Hagedorn and Landweber 2000Go; Felsenstein 2003Go). A practical introduction to the theory of invariants can be found in the book by J. Felsenstein (2003Go), Chapter 22, whereas the book by Pachter and Sturmfels (2005)Go provides a beautiful insight into the applications of algebraic statistics (and in particular polynomial phylogenetic invariants) to computational biology.

There are 2 major motivations for using phylogenetic invariants in tree reconstruction. One of them is the prohibitive computational expense of a full maximum likelihood estimation of a tree, its edge lengths, a base distribution, and a rate matrix. The other is that the evolutionary models underlying the theory of invariants allow for nonhomogeneous mutation. Indeed, it is known that, for some biological data sets, different rate matrices should be allowed in different lineages. Thus, it is essential to have at our disposal phylogenetic methods for reconstructing trees admitting nonhomogeneous models (Galtier and Gouy 1998Go; Yang and Yoder 1999Go).

In this paper, a phylogenetic reconstruction method that uses polynomial phylogenetic invariants (introduced in Casanellas et al. 2005Go) is studied and tested for quartet trees evolving under the Kimura (1981)Go 3-parameter model of nucleotide substitution. Actually, we consider an "algebraic" Kimura model: the parameters of the model are the entries of the substitution matrices on the edges (and not a single rate matrix together with edge lengths). Hence, the model is nonhomogenous—because it allows different rate matrices among the edges—but it is stationary (and the distribution of the bases is uniform), and we always assume that all sites are independent and identically distributed (i.i.d. hypotheses). We performed simulation studies to test its efficiency. One of our approaches to evaluate the performance and efficiency of the method is taken from Huelsenbeck (1995)Go so that a large portion of the tree space is examined to get a general idea of how the algorithm performs. We present the results obtained for sequences of length 100 up to 10,000 base pairs.

We also checked the performance of the method on simulated data from nonhomogeneous models by carrying out a comparison to Neighbor-Joining algorithm (Saitou and Nei 1987Go), maximum likelihood algorithm (Felsenstein 1981Go), and an algorithm for a nonhomogeneous model from PAML (Yang 1997Go) for sequences generated under a Kimura (1980)Go 2-parameter model and different rate matrices along different tree branches.


    Results
 TOP
 Abstract
 Introduction
 Results
 Methods
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Homogeneous Data
The performance of the invariants method studied here on homogeneous Kimura models can be seen in the figure 1. Using the approach of J.P. Huelsenbeck (1995)Go for quartet trees, we considered 2 branch-length parameters a and b and simulated data for each pair of lengths. Parameter a assigns the branch length to the internal branch and 2 opposite peripheral branches, and parameter b assigns the branch length to the 2 remaining branches. Parameters a and b were varied from 0.01 to 0.75 in increments of 0.02, and 1,000 alignments were simulated for each couple (a, b) (see Huelsenbeck 1995Go, Figures 2 and 3 or fig. I, Supplementary Material online, for a clear picture of this space of parameters). The simulated trees evolve under the Kimura (1981)Go 3-parameter model with a fixed rate matrix of the following form along the tree (· = – {alpha} – ß – {gamma}):

Formula
Figure 1 shows the efficiency of the method considered in this paper for a rate matrix with parameters {gamma} = 0.1, {alpha} = 3.0, and ß = 0.5 (hence, a 5:1 transition:transversion bias) and for nucleotide sequences of lengths 100, 500, 1,000, and 10,000. See also figures III and IV, Supplementary Material online, for studies performed with other rate matrices. This figure is to be compared with those shown in Figure A2 of (Huelsenbeck 1995)Go, (corresponding to phylogenetic inference for sequences generated under a Kimura (1980)Go 2-parameter model of substitution with 5:1 transition:transversion bias). Though this is of course a biased comparison because our method admits nonhomogeneous data and Kimura 3-parameter model, it is worth noticing that even on data from homogeneous simulations our method outperforms many of the methods considered there. In particular, it is clearly better than Lake's invariant method (which is not surprising because Lake's method only used linear invariants). At least for sequences of 500 base pairs or more, our method performs better than Neighbor-Joining—referred as minimum evolution (Kimura1) in Huelsenbeck (1995)Go, Figure A2. Notice also that the shape of the 95% isocline for lengths around 500 base pairs or more is quite different from the corresponding shapes in the methods tested by Huelsenbeck (see his Figure 7): for large values of a (near 0.7), the performance of our method of invariants does not drop drastically (as it does for most methods considered there). Therefore, for these values, our invariants method outperforms all the methods studied in Huelsenbeck (1995)Go. As it can be seen in figures III and IV, Supplementary Material online, it seems that the method performs better for small transition:transver sion ratios.


Figure 1
View larger version (61K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— The graphics above represent the probability of reconstructing the correct tree in the parameter space (see fig. I, Supplementary Material online). Black areas correspond to couples (a, b) for which the tree was correctly estimated, whereas white regions correspond to couples (a, b) for which the tree is never estimated; gray tones indicate areas of intermediate probability. The 95% isocline is drawn in white, whereas the 33% is drawn in black. The 4 graphics above show the results obtained under the homogeneous Kimura 3-parameter model when using a rate matrix with parameters {gamma} = 0.1, {alpha} = 3.0, and ß = 0.5 (hence a 5:1 transition:transversion bias).

 

Figure 2
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Performance of Neighbor-Joining and the invariants method on nonhomogeneous data for sequence length varying from 100 to 3,000 in intervals of 100. For each length, 100 sets of data were generated. The nonhomogeneous tree used for simulations is described in figure 3.

 

Figure 3
View larger version (5K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— A nonhomogeneous tree used for simulations. The numbers labeling each edge correspond to branch lengths (i.e., expected percent change between the 2 taxa on the edge). (1) In comparing our method with Neighbor-Joining, the parameters {alpha}i, ßi, and {gamma}i in the Kimura 3-parameter rate matrices Qi were chosen as: {gamma}1 = 1, {alpha}1 = 4, ß1 = 1, {gamma}2 = 5, {alpha}2 = 14, ß2 = 3,{gamma}3 = 4, {alpha}3 = 15, ß3 = 3, {gamma}4 = 2, {alpha}4 = 6, ß4 = 2, {gamma}5 = 2, {alpha}5 = 3, and ß5 = 1. (2) In comparing our method with maximum likelihood algorithm based on the Kimura 2-parameter model, we chose {gamma} = ß = 1 in all rate matrices, whereas parameter {alpha} was set as follows: in Q1, {alpha} = 4, in Q2, Q4, and Q5, {alpha} = 3 + {varepsilon}2, in Q3, {alpha} = 3 + {varepsilon}. When {varepsilon} = 1, the tree is homogeneous and we studied the efficiency of 3 methods as {varepsilon} increases up to 9.

 
For length 1,000, the efficiency of the invariants method considered here is similar to that obtained for lengths ≥10,000 in many of the methods tested in Huelsenbeck (1995)Go. From this it can be inferred that, in order to reconstruct the correct tree, much less data is needed in the invariants method presented here than in many other methods (contrary to what was thought until now in Hagedorn and Landweber 2000Go).

Nonhomogeneous Data
We tested the invariants method studied in this paper on data simulated according to a nonhomogeneous Kimura model by comparing it with other methods.

Comparison with Neighbor-Joining
First of all, we compared the performance of the invariants method presented here with Neighbor-Joining (the algorithm of Saitou and Nei 1987Go) using Kimura (1981)Go 3-parameter distance. As it can be seen in figure 2, considering certain nonhomogeneous sets of simulated data, we found that the invariants method is more efficient than Neighbor-Joining. Indeed, we simulated data on an unrooted quartet tree evolving under the Kimura 3-parameter model where different rate matrices at each edge were chosen (see fig. 3), and we studied the efficiency of the method when varying the length of nucleotide sequences. In this case, the mean of correctly reconstructed trees for the invariants method is 90.2%, whereas the mean for Neighbor-Joining is 84%. It is worth pointing out that the use of Kimura distance is still justified in the nonhomogeneous setting, so that it is fair to compare both methods. For this particular tree, the maximum likelihood algorithm for the Kimura 3-parameter model reconstructed the tree correctly almost all times, so we do not include the results here.

Comparison with Maximum Likelihood
We compared the invariants method with 2 versions of maximum likelihood: the usual maximum likelihood for Kimura (1980)Go 2-parameter and nonhomogeneous maximum likelihood method developed in the package PAML (Yang 1997Go) for Kimura 2-parameter model (namely, the option nhomo). This last method allows different transition/transversion ratio in different tree branches. To perform this comparison, we simulated data according to the tree in figure 3 evolving under the Kimura 2-parameter model. When {varepsilon} = 1, the tree is homogeneous and we studied the efficiency of the 3 methods as {varepsilon} increases up to 9. Figure 4 summarizes the results relative to the comparison between our method, the nonhomogeneous method in PAML for Kimura 2-parameter model (option nhomo=2 in the baseml control file) and the maximum likelihood homogeneous in PAML for Kimura 2-parameter (option nhomo=0). It shows the percentage of correctly reconstructed trees for each value of parameter {varepsilon}. As expected, the homogeneous maximum likelihood algorithm becomes less efficient as {varepsilon} increases. The use of a method assuming homogeneity on data produced in accord with a nonhomogeneous model is, of course, not recommended as such model misspecification is known to lead to unreliable results. Already for {varepsilon} = 5, the invariants method overtakes the homogeneous maximum likelihood algorithm. As it is deduced from figure 4, our method performs worse than the nonhomogeneous algorithm in PAML. However, it is worth noticing that in this test we generated data according to Kimura 2-parameter model and the invariants method presented here was developed under Kimura 3-parameter model. Similar results are obtained when the data evolve under Kimura 3-parameter model (see figure II, Supplementary Material online).


Figure 4
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— The effect of rate heterogeneity among lineages on method performance. The simulated nucleotide sequences have length 1,000 and evolve under the nonhomogeneous Kimura 2-parameter model, as described in figure 3 (homogeneous for {varepsilon} = 1). The trees were reconstructed with our method (invariants), with PAML under a nonhomogeneous method (PAML nonhomogeneous) and the usual PAML maximum likelihood homogeneous method (PAML homogeneous). For each {varepsilon}, 100 sets of data were generated.

 

    Methods
 TOP
 Abstract
 Introduction
 Results
 Methods
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The phylogenetic reconstruction method used in this paper is based on phylogenetic invariants and was first introduced by the first author and L.D. Garcia and S. Sullivant in Casanellas et al. (2005)Go.

The taxa are given by an alignment of n DNA sequences of length N. A Markov process along an unrooted binary tree of n species is assumed, and we consider that all sites are independently and identically distributed (i.i.d. hypotheses). The parameters of the model we are considering are the entries of the substitution matrices of a Kimura (1981)Go 3-parameter model (we should rather speak of an "algebraic" Kimura 3-parameter model, according to the book by Pachter and Sturmfels 2005Go, Chapters 1 and 4). Note that in the usual Kimura 3-parameter model, a rate matrix Q is fixed and common to the whole tree and the substitution matrix is the exponential eQti (for some parameter ti representing time). However, in our method, we do not make use of rate matrices Q—we only use the substitution matrices—so, as we will see later, the rate matrices might vary among the edges. It is worth pointing out that, as we consider a Kimura 3-parameter model along all branches, the uniform distribution of base composition holds for the whole tree (stationarity hypothesis).

Phylogenetic Invariants
Sturmfels and Sullivant (2005)Go gave an explicit description of the generators of the set of polynomial phylogenetic invariants I(T) for an arbitrary tree evolving under a "group-based model12." For the Kimura 3-parameter model on an unrooted 4-taxa tree, the ideal of phylogenetic invariants has 8,002 minimal generators (see the Web page of Garcia and Porter 2005Go, Casanellas et al. 2005Go and the discussion in Sturmfels and Sullivant 2005Go, Section 7 to see why a smaller subset of invariants does not suffice). According to the results in Sturmfels and Sullivant (2005)Go, we produced this generating set for an unrooted tree with 4 leaves under the Kimura 3-parameter model. This requires doing a Fourier transform (or Hadamard conjugation) on the vector of probabilities (pA...A,...,pT...T), and the phylogenetic invariants are then described as binomials in the Fourier coordinates.

Algorithm
Our tree reconstruction algorithm performs the following tasks. Given 4 aligned DNA sequences s1, s2, s3, and s4, it first computes the observed relative frequencies of each pattern for the topology ((s1, s2), s3, s4) on an unrooted quartet tree. Then it transforms these relative frequencies to Fourier coordinates. From this, we compute the Fourier coordinates in the other 2 possible topologies for unrooted trees with 4 species. We then evaluate all phylogenetic invariants for the Kimura 3-parameter model in the Fourier coordinates of each tree topology. We call sfT the absolute value of this evaluation for the polynomial f and tree topology T. From the values {sfT}f, we produce a score for each tree topology T, namely Formula The algorithm then chooses the topology that corresponds to the minimum score. The code was written in PERL and is available upon request. For an alignment of 4 sequences of 1,000 nt, it takes 0.35 s on a single 3.0-Ghz processor.

Software Used
The simulations of this study were obtained using the program Seq-Gen v1.3.2 (Rambaut and Grassly 1997Go). We used the algorithms implemented in the package APE (Paradis et al. 2006Go) v1.8-3 of R (R Development Core Team 2005Go) v2.1.1 to compute Kimura (1981)Go 3-parameter distance and perform Neighbor-Joining algorithm (Saitou and Nei 1987Go). The package PAML (Yang 1997Go) was used for phylogenetic inference involving maximum likelihood methods.


    Discussion
 TOP
 Abstract
 Introduction
 Results
 Methods
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The simulation studies performed in this paper present a very competitive phylogenetic reconstruction method based on invariants. If one compares our results on the tree space and those of Huelsenbeck (1995)Go, one sees that the method presented here is highly efficient. Note that this might be a biased comparison as in both papers homogeneous models are used for sequence generation and, although he uses homogeneous methods for inference, our invariant method allows nonhomogeneous models—remember that one generally obtains the best results using the most restricted model that fits the data. There are some limitations of the tree space study performed here, though. For example, this tree space does not consider trees where the inner edge is extremely small or extremely large with respect to the peripheral branches. As Huelsenbeck points out, the usefulness of considering this parameter space can be questioned, but he also gives strong arguments that convinced us to work in this parameter space. Moreover, considering the same parameter space as Huelsenbeck allows one to compare our results with the other methods studied by him.

We would like to comment on the algorithm presented here and, more generally, on all methods based on invariants. First of all, we need to emphasize that the computation of the phylogenetic invariants of a given model just needs to be done once, so the computation need not contribute to the running time of an algorithm using them. Secondly, increasing the size of the sequences does not drop the computational efficiency of the algorithm. Indeed, the sequence's length only accounts for computing the relative frequencies of the observed patterns (which is something that most algorithms based on evolutionary models must do), but it does not participate in any other part of the algorithm.

A small comment on the election of the 1-norm: we performed simulation studies not presented here to prove that the algorithm performs clearly better with 1-norm than with maximum norm and slightly better than with the euclidean norm (the "1-norm" of a vector x = (x1, ..., xn) is Formula the "euclidean norm" is Formula and the "maximum norm" is ||x||{infty} = maxi |xi|). Another consideration that might be important for the computational efficiency of the method is that, in Fourier coordinates, the polynomials considered here are "binomials," and hence they are easy to evaluate at a given point (so there is no need to worry computationally about the evaluation of the polynomials). Moreover, as it is proved in Sturmfels and Sullivant (2005)Go, these binomials have degree 4 at the most so, again, the computational cost is low. Choosing another generating set of invariants or a different weightening of the polynomials will lead to other results, so this is an issue that should certainly be studied in the future.

We implemented and tested the algorithm presented here only on 4-taxon trees. This seems a limitation of the method, but as the reader may have noticed, the method is universal and could be used to infer the topology of trees with arbitrary number of taxa. However, the computational demands of deducing large phylogenies led us not to develop this algorithm for larger trees. Instead, we suggest that invariants might be a good starting point for quartet methods of phylogenetic inference. In this direction, it is also worth thinking about new tree reconstruction algorithms for arbitrary taxa based on invariants (this is something the authors will surely work on in the future).

In this paper, we focused on the Kimura 3-parameter evolutionary model. However, a full generating set of invariants is known for any group-based model (Sturmfels and Sullivant 2005Go), a large set of invariants is known for the general Markov model (Allman and Rhodes 2003Go, 2004aGo) and for a strand symmetric model (Casanellas and Sullivant 2005Go), and some invariants are already known for certain rate-class models (Allman and Rhodes 2006Go). Therefore, the method presented here can be extrapolated to these evolutionary models and in the future further models can be considered. In particular, invariants might also perform well for models allowing site-to-site variation.

As we pointed out in the introduction, invariants-based methods focus on recovering the tree topology and not estimating the parameters. Nevertheless, as Allman and Rhodes (2003Go, 2004bGo) say, it is fair to think that phylogenetic invariants may also be useful for parameter recovery.

As we have already mentioned, one of the advantages of the method presented here versus other methods of phylogenetic reconstruction based on evolutionary models is that the model considered here is nonhomogeneous in the sense that the rate matrix is allowed to vary along the different branches of the tree. However, as we are assuming a Kimura model, the base distribution is the same at all nodes of the tree and so the model is stationary (note that invariants methods based on other evolutionary models would not require stationarity of base distribution). For an unrooted tree with n taxa, our algebraic Kimura model has 3(2n – 3) free parameters, and it is a special case of the "general Markov model12" which involves 12(2n – 3) + 3 parameters. If one considered a Kimura 3-parameter model (not the algebraic model considered here) allowing different rate matrices along the tree one would have 3(2n – 3) + 2n – 4 (the extra parameters correspond to time parameters). Other nonhomogeneous models have been considered in the literature, see for example Galtier and Gouy (1998)Go and Yang and Roberts (1995)Go. The emphasis in these 2 works is put on the nonstationarity hypothesis, and the maximum likelihood approach taken in most nonhomogeneous methods makes them computationally unfeasible.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Results
 Methods
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary figures I–IV are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Results
 Methods
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The authors would like to thank B. Sturmfels, L. Pachter, and N. Eriksson for useful comments on a draft version of the paper. We would also like to thank the reviewers for useful comments that improved the paper. Both authors were partially supported by Ministerio de Educación y Ciencia (Ramón y Cajal and Juan de la Cierva programs) and BFM2003-06001. J.F.S. was also supported by MTM2005-01518.


    Footnotes
 
Peter Lockhart, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Results
 Methods
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Allman E and Rhodes J. (2003) Phylogenetic invariants for the general Markov model of sequence mutation. Math Biosci 186:2113–144.[CrossRef][ISI][Medline]

    Allman E and Rhodes J. (2004a) Phylogenetic ideals and varieties for the general Markov model. Available from: http://arxiv.org/abs/math.AG/0410604. Advances in Applied Mathematics.

    Allman E and Rhodes J. (2004b) Quartets and parameter recovery for the general Markov model of sequence mutation. Appl Math Res Exp 2004:4107–131.[CrossRef]

    Allman E and Rhodes J. (2006) The identifiability of tree topology for phylogenetic models, including covarion and mixture models. J Comput Biol 13:1101–1113.[CrossRef][ISI][Medline]

    Casanellas M, Garcia L, Sullivant S. (2005) Catalog of small trees. In Pachter L and Sturmfels B (Eds.). Algebraic statistics for computational biology. Chapter 15(Cambridge University Press, Cambridge (UK)).

    Casanellas M and Sullivant S. (2005) The strand symmetric model. Algebraic statistics for computational biology. Chapter 16 (Cambridge University PressIn Pachter L and Sturmfels B (Eds.). , Cambridge (UK)).

    Cavender J and Felsenstein J. (1987) Invariants of phylogenies in a simple case with discrete states. J Classif 4:57–71.[CrossRef]

    Eriksson N. (2005) Tree construction using singular value decomposition. In Pachter L and Sturmfels B (Eds.). Algebraic statistics for computational biology. Chapter 19(Cambridge University Press, Cambridge (UK)) pp. 347–358.

    Evans S and Speed T. (1993) Invariants of some probability models used in phylogenetic inference. Ann Stat 21:355–377.

    Felsenstein J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376.[CrossRef][ISI][Medline]

    Felsenstein J. (2003) Inferring phylogenies(Sinauer Associates Inc, Sunderland (MA)).

    Ferretti V and Sankoff D. (1995) Phylogenetic invariants for more general evolutionary models. J Theor Biol 14:7–162.

    Galtier N and Gouy M. (1998) Inferring pattern and process: maximum likelihood implementation of a non-homogeneous model of DNA sequence evolution for phylogenetic analysis. Mol Biol Evol 154:4871–879.

    Garcia LD and Porter J. (2005) Small phylogenetic trees webpage. Available from: http://bio.math.berkeley.edu/ascb/chapter15/.

    Hagedorn T and Landweber L. (2000) Phylogenetic invariants and geometry. J Theor Biol 205:365–376.[CrossRef][ISI][Medline]

    Huelsenbeck J. (1995) Performance of phylogenetic methods in simulation. Syst Biol 44:17–48.[CrossRef][ISI]

    Jin L and Nei M. (1990) Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol Biol Evol 7:82–102.[Abstract]

    Kim YR, Kwon O, Paeng S, Park C. (2006) Phylogenetic tree constructing algorithms fit for grid computing with SVD. Available from http://arix.org/abs/q-bio.QM/0611015.

    Kimura M. (1980) A simple method for estimating evolutionary rates of base substitution through comparative studies of nucleotide sequences. J Mol Evol 16:111–120.[CrossRef][ISI][Medline]

    Kimura M. (1981) Estimation of evolutionary sequences between homologous nucleotide sequences. Proc Natl Acad Sci USA 78:454–458.[Abstract/Free Full Text]

    Lake J. (1987) A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol 4:167–191.[Abstract]

    In Pachter L and Sturmfels B (Eds.). Algebraic statistics for computational biology (2005) (Cambridge University Press, Cambridge (UK)).

    Paradis E, Strimmer K, Claude J, Jobb G, Opgen-Rhein R, Dutheil J, Noel Y, Bolker B, Lemon J. (2006) Ape: analyses of phylogenetics and evolution. R package version 1.8-3.

    R Development Core Team. (2005) R: a language and environment for statistical computing. (R Foundation for Statistical Computing, Vienna (Austria)).

    Rambaut A and Grassly N. (1997) Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci 13:235–238.[Abstract/Free Full Text]

    Saitou N and Nei M. (1987) The neighbor joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:4406–425.[Abstract]

    Sankoff D and Blanchette M. (1999) Phylogenetic invariants for genome rearrangements. J Comput Biol 6:431–445.[CrossRef][ISI][Medline]

    Steel M, Székely L, Erdos P, Waddell P. (1993) A complete family of phylogenetic invariants for any number of taxa under Kimura's 3ST model. N Z J Bot 31:289–296.

    Sturmfels B and Sullivant S. (2005) Toric ideals of phylogenetic invariants. J Comput Biol 12:204–228.[CrossRef][ISI][Medline]

    Yang Z. (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 15:555–556.

    Yang Z and Roberts D. (1995) On the use of nucleic acid sequences to infer early branchings in the tree of life. Mol Biol Evol 12:451–458.[Abstract]

    Yang Z and Yoder AD. (1999) Estimation of the transition/transversion rate bias and species sampling. J Mol Evol 48:274–283.[CrossRef][ISI][Medline]

Accepted for publication October 17, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
24/1/288    most recent
msl153v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Casanellas, M
Right arrow Articles by Fernández-Sánchez, J
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Casanellas, M
Right arrow Articles by Fernández-Sánchez, J
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?