Skip Navigation


MBE Advance Access originally published online on November 30, 2005
Molecular Biology and Evolution 2006 23(3):626-632; doi:10.1093/molbev/msj069
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/3/626    most recent
msj069v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chor, B.
Right arrow Articles by Snir, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chor, B.
Right arrow Articles by Snir, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Article

Maximum Likelihood Jukes-Cantor Triplets: Analytic Solutions

Benny Chor*, Michael D. Hendy{dagger} and Sagi Snir{ddagger}

* School of Computer Science, Tel-Aviv University, Israel; {dagger} Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand; and {ddagger} Mathematics Department, University of California, Berkeley

E-mail: m.hendy{at}massey.ac.nz.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Appendix
 Acknowledgements
 References
 
Maximum likelihood (ML) is a popular method for inferring a phylogenetic tree of the evolutionary relationship of a set of taxa, from observed homologous aligned genetic sequences of the taxa. Generally, the computation of the ML tree is based on numerical methods, which in a few cases, are known to converge to a local maximum on a tree, which is suboptimal. The extent of this problem is unknown, one approach is to attempt to derive algebraic equations for the likelihood equation and find the maximum points analytically. This approach has so far only been successful in the very simplest cases, of three or four taxa under the Neyman model of evolution of two-state characters. In this paper we extend this approach, for the first time, to four-state characters, the Jukes-Cantor model under a molecular clock, on a tree T on three taxa, a rooted triple. We employ spectral methods (Hadamard conjugation) to express the likelihood function parameterized by the path-length spectrum. Taking partial derivatives, we derive a set of polynomial equations whose simultaneous solution contains all critical points of the likelihood function. Using tools of algebraic geometry (the resultant of two polynomials) in the computer algebra packages (Maple), we are able to find all turning points analytically. We then employ this method on real sequence data and obtain realistic results on the primate-rodents divergence time.

Key Words: maximum likelihood • phylogenetic trees • Jukes-Cantor • Hadamard conjugation • analytical solutions • symbolic algebra


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Appendix
 Acknowledgements
 References
 
Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees (Felsenstein 1981Go), but finding the global optimum is a hard computational task, which led to using mostly numeric methods. So far, analytical solutions have been derived only for the simplest models (Yang 2000Go; Chor, Hendy, and Penny 2001Go; Chor, Khetan, and Snir 2003Go)—three and four taxa under a molecular clock, with just two-state characters (Neyman 1971Go). In this work we present, for the first time, the analytic solutions for a general family of trees with four-state characters, such as DNA or RNA. The model of substitution we use is the Jukes-Cantor model (Jukes and Cantor 1969Go) under a molecular clock, the simplest four-state model where all substitutions have the same rate. The trees we deal with are on just three taxa, namely, rooted triplets (see fig. 1b).


Figure 1
View larger version (7K):
[in this window]
[in a new window]
 
FIG. 1.— (a) A tree T on three leaves, 1, 2, and 3, with edges labeled. (b) A rooted tree on three leaves, with root r, with edge weights q1, q2, and q3. If q1 = q2 ≤ q3, then this tree satisfies the molecular clock as the path lengths from r to each leaf are the same.

 
The change from two to four character states incurs a major increase in the complexity of the underlying algebraic system. Like previous works, our starting point is to present the general ML problem on phylogenetic trees as a constrained optimization problem. This gives rise to a complex system of polynomial equations, and the goal is to solve them. The complexity of this system prevents any manual solution, so we relied heavily on Maple, a symbolic mathematical system. However, even with Maple, a simple attempt to solve the system (e.g., just typing solve) fails, and additional tools are required. Spectral analysis (Hendy and Penny 1993Go; Hendy, Penny, and Steel 1994Go; Hendy and Snir 2005Go) uses Hadamard conjugation to express the expected frequencies of site patterns among sequences as a function of an evolutionary tree T and a model of sequence evolution along the edges of T. As in previous works, we use the Hadamard conjugation as a basic building block in our method of solution. However, it turns out that the specific way we represent the system, which is determined by the choice of variables, plays a crucial role in the ability to solve it. In previous works (Chor, Hendy, and Penny 2001Go; Chor, Khetan, and Snir 2003Go), the variables in the polynomials were based on the expected sequence spectrum (Hendy and Penny 1993Go). This representation leads to a system with two polynomials of degrees 11 and 10. This system is too complex to resolve with the available analytic and computational tools. By employing a different set of variables, based on the path-set spectrum (Hendy and Snir 2005Go), we were able to arrive at a simpler system of two polynomials whose degrees are 10 and 8. To finesse the construction, we use algebraic geometry combined with Maple. Specifically, we now compute the resultant of the two polynomials, which yields a single, degree 11 polynomial in one variable. The roots of this polynomial yield the desired ML solution. We remark that similar results to those shown here were obtained by Hosten, Khetan, and Sturmfels (2004)Go, using somewhat different techniques. A set of computational algebraic tools for analyzing similar models on small trees based on the methods reported in Catanese et al. (2004)Go; Hosten, Khetan, and Sturmfels (2004)Go; and Sturmfels and Sullivant (2005)Go is detailed in the web page http://math.tamu.edu/~lgp/small-trees/.

It is evident that the currently available heuristic methods can fail to infer the correct tree, even for small number of taxa. This is true not only in the presence of multiple ML points but also in cases where the neighborhood of the (single) ML point is relatively flat. Therefore, a practical consequence of this work is the use of rooted triplets in supertree methods (constructing a big tree from small subtrees). For unrooted trees, the subtrees must have at least four leaves (e.g., "the tree from quartets" problem). For rooted trees, it is sufficient to amalgamate a set of rooted triplets (Aho et al. 1981Go). Our work enables such approaches to rely on rooted ML triplets based on four characters states rather than two.

Additionally, analytic solutions are able to reveal properties of the ML points that are not obtainable numerically. For small trees, our result can serve as a method for checking the accuracy of the heuristic methods used in practice.

The remainder of this work is organized as following. In the next section we describe the materials and methods we used in this work. Specifically, in Definitions, Notations, and the Hadamard Conjugation we provide definitions and notations used throughout the rest of this work. In Definitions, Notations, and the Hadamard Conjugation we develop the identities and structures induced by the Jukes-Cantor model, while in Obtaining the ML Solution we develop ML on phylogenetic trees and subsequently solves the system for the special case of three species under Jukes-Cantor and molecular clock. In Results and Discussion we give results and discuss future research directions: Results on Genomic Sequences reports on experimental results of applying our method on real genomic sequences, while in Directions for Future Research we conclude with three open problems.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Appendix
 Acknowledgements
 References
 
Here we detail on the methods and tools we employed in order to obtain our results. These are developed specifically for the tree T on three taxa referred to as 1, 2, and 3. We label the leaves of T as 1, 2, and 3 and the edges (branches) as e1, e2, and e3, as illustrated in figure 1a. Taxon 3 is chosen to be the reference taxon. Our analysis is focused on the site substitutions required to transform the reference sequence to those of 1 and 2 under Kimura's 3-substitution model (K3ST) (Kimura 1981Go). We will subsequently impose the constraints of the Jukes-Cantor model (Jukes and Cantor 1969Go), and under a molecular clock, on the rooted tree of figure 1b.

Definitions, Notations, and the Hadamard Conjugation
Given aligned nucleotide sequences for the three taxa 1, 2, and 3, there are 43 nucleotide combinations possible at each site. We refer to each of these as a "character pattern." A character pattern Formula can be described by the state {chi}3 at taxon 3, together with the substitutions Formula

Kimura (1981)Go in his K3ST model, describes three classes of substitution: the transitions; and two types of transversions. Following Hendy and Snir (2005)Go, we use {alpha} to denote a transition, ß and {gamma} to denote the two transversion types, together with {epsilon} to indicate no substitution, as described in figure 2. For example, the character pattern Formula is obtained with the assignment of C to taxon 3, and the substitution pattern Formula for taxa 1 and 2.


Figure 2
View larger version (11K):
[in this window]
[in a new window]
 
FIG. 2.— K3ST model. Substitution types {alpha} and {gamma} cross the line aa', and ß and {gamma} cross the line bb'.

 
In Hendy, Penny, and Steel (1994)Go it is shown that under the K3ST model, the probability of a substitution pattern is independent of the character state at taxon 3. We index each site pattern by a pair (X, Y) of subsets of {1, 2}, where X is the set of taxa with substitutions which cross the line aa' in (fig. 2), and Y is the set of taxa with substitutions which cross the line bb'. We denote the probability of site pattern (X, Y) as sX,Y, which when not ambiguous, we index by the lists of elements of X and of Y, for brevity. Thus, for example, the substitution pattern Formula is indexed by the pair of subsets ({1, 2}, {2}), and the probability of obtaining this substitution pattern is written as s12,2.

The 16 site pattern probabilities are arranged as a 4 x 4 matrix

Formula
called the "sequence spectrum." These probabilities can be parameterized in several ways, in Hendy and Penny (1993)Go and Hendy, Penny, and Steel (1994)Go for the K3ST model, they are given as functions of the expected numbers of substitutions of each type on each edge.

In the Jukes-Cantor model, the expected numbers of the three substitution types on an edge ei are the same, that is,

Formula
We will write this common value as qi. Thus, q1, q2, and q3 are the expected numbers of substitutions of each type, on edges e1, e2, and e3, respectively. Following the results of Hendy, Penny, and Steel (1994)Go; Steel, Hendy, and Penny (1998)Go; and Hendy and Snir (2005)Go specialized to the Jukes-Cantor model on T for three taxa, we express these expected numbers in the 4 x 4 matrix

Formula
called the "edge-length spectrum." The major result of Hendy, Penny, and Steel (1994)Go then becomes the following.

Theorem 1

Formula 1(1)
where

Formula 2(2)
Formula 2 and the log function ln is applied to each component of the matrix HQH.

Equation (2) in Theorem 1 is equivalent to earlier expressions (Steel et al. 1992Go; Székely et al. 1993Go) of Hadamard conjugations for the K3ST model, which used Hadamard matrices of 2n rows and columns applied to vectors of 2n entries, with qi = q{alpha}(ei) = qß(ei) = q{gamma}(ei). The current expression has recently been developed in Hendy and Snir (2005)Go. A full proof of this result follows directly from applying Theorem 7 of Hendy and Snir (2005)Go on the tree T. This approach has been followed in order to clarify the role of path sets, which explain the intermediate terms in the conjugations, enabling us to derive and interpret the values in equation (8) directly.

We now define several auxiliary matrices that will be useful in the sequel:

Formula 2

Formula 2
The following identities relating H and these six matrices hold:

Formula 2

Formula 3(3)

Formula 4(4)

Formula 5(5)

Formula 6(6)

Formula 7(7)
The edge-length spectrum of an arbitrary 3-tree can be expressed as the 4 x 4 matrix,

Formula 7
Now, from equations (3–7 GoGoGoGo), we see

Formula 7
so applying the exponential function to each element of the matrix HQH, we obtain the so called path-set spectrum, R:

Formula 8(8)
where

Formula 9(9)
The xi values can replace the qi values as the defining parameters and are called the "path-set variables." The entries of R relate to the probabilities of differences between the end points of paths in T (Hendy and Snir 2005Go).

From Theorem 1, we find the expected sequence spectrum is

Formula 10(10)

Formula 11(11)
where

Formula 12(12)
Thus, we see that the expected frequency of each site pattern takes one of the five ai values, each of which is a function of the three parameters x1, x2, and x3.

In particular, when we impose the molecular clock condition q1 = q2, and hence x1 = x2, equation (12) reduces to

Formula 13(13)
and we observe a1 = a2.

Obtaining the ML Solution
When we have an aligned sequence of length c for each of the three taxa 1, 2, and 3, we can count the relative frequencies (normalized to sum to 1) of the substitution patterns among the sites. For (X, Y) | X, Y {subseteq} {1, 2}, we set fX,Y to be the relative frequency of observing the site pattern (X, Y), and let F be the 4 x 4 matrix with entries f(X, Y) indexed as in the sequence spectrum S. If the sequences were generated under the Jukes-Cantor model on the tree T with edge weights q1, q2, q2, then the expected number of substitution pattern (X, Y) is fX,Y = csX,Y, where c is the number of sites.

However, in c independent samples, the observed frequencies will include sampling error, so we cannot directly conclude S = F. ML provides a method of estimating the parameters q1, q2, and q3 from the observed frequencies F.

Given a set of edge lengths q1, q2, and q3, the "likelihood" of observing F under the Jukes-Cantor model on T is

Formula 14(14)
where the entries in S are obtained from equation (11). This gives identities among the pattern probabilities sX,Y, so grouping the common factors in equation (12) gives

Formula 15(15)
where

Formula 15

Formula 15

Formula 15

Formula 15

Formula 15
The expected sequence spectrum S can be expressed as a function of the three variables x1, x2, and x3, so the values which maximize the likelihood L are obtained when each of the three partial derivatives, Formula 15 In contrast to previous works (Chor et al. 2000Go; Chor, Hendy, and Penny 2001Go; Chor, Khetan, and Snir 2003Go; Chor and Snir 2004Go), that operated in the space of the expected sequence variables, SD,E, here we are operating in the space of the path-set variables Formula 15 This eliminates the need to introduce the constraints of the ML points being on a "tree surface." By the chain rule, we get:

Formula 16(16)
which must be 0 at each ML point. From equation (12), we can determine the matrix of partial derivatives,

Formula 17(17)
hence, at each turning point (where Formula 17) we have from equation (14)

Formula 18(18)
The three relations of equation (18) are rational functions in the three variables xi. If we multiplied by the common denominators, we obtain three polynomials, each of degree 14 in the variables. Solving these simultaneously might be feasible, however, we can simplify the system by imposing the additional constraints

Formula 18
which constrain the edge lengths to satisfy the molecular clock, as illustrated in figure 1b. These constraints imply x1 = x2 ≥ x3. In our analysis below, we will explicitly impose the equality x1 = x2 to find the turning points. The inequality will need to be tested on any potential solution and, if it were not satisfied, a maximum could be sought on the boundary of the valid tree domain, where x1 = x2 = x3.

The constraint x1 = x2 implies a1 = a2, so by replacing x1 and a1, equation (18) reduces to

Formula 19(19)
From equation (12), each ai is a polynomial in x2, x3, so multiplying equation (18) by a0a2a3a4 gives two polynomial equations in the two variables x2, x3 and the observed frequencies fj. We refer to these polynomial equations as E1 and E2.

We now show that the system of two resulting polynomials {E1, E2} has only finitely many solutions, all of which we can find. The major tool used here is the "resultant" of two polynomials. Let Formula 19 and Formula 19 be two polynomials in one variable, x. The resultant of f and g, denoted Res(f, g, x), is a polynomial in the coefficients ai and bj of f and g, which is 0 whenever f and g have a common zero. The coefficients can themselves be unknowns or functions of other variables, in which case, the resultant replaces the two polynomials f and g by a single polynomial in one fewer variable.

Computing the resultant is a classical technique for eliminating one variable from two equations. There is an elegant formula for computing it due to Sylvester and another due to Bezout, which have been implemented in the computer algebra package Maple.

We can compute the resultant ER = Res(E1, E2, x3) of E1 and E2 with respect to x3. This eliminates x3 from the equations and yields a single polynomial ER, in just x2 and the parameters. The polynomial ER has the form:

Formula 20(20)
where P0 is the degree 11 polynomial with 288 monomials displayed in the appendix and k is some large constant.

Theorem 2 The turning points of L (eq. 14) corresponding to realistic trees (namely, trees with positive edge lengths) are exactly the roots of P0.

Proof. The only factors in ER (eq. 20) that admit positive real roots are P0 and Formula 20 However, x2 = 1 implies q2 = 0, which is not realistic. {square}

Corollary 3 The Jukes-Cantor triplet has a finite number of ML points.

Proof. P0 has at most 11 different solutions and, for each such solution, we can back substitute to obtain all the values of x3{square}

Maxima on the Boundaries of the Parameter Space
The realistic parameter space R is bounded by the constraints

Formula 20
which implies

Formula 20
The analysis above finds maxima at turning points in the interior of R, it is also possible that the maximum value of L can be on the boundary, which is not a turning point. To find these we must consider the turning points of L when the parameters are constrained to the boundary.

We identify three boundary subspaces

Formula 20
We must independently examine maxima on each boundary subspace. In each case, the likelihood function on the boundary can be described by a single parameter xi, then solving Formula 20 gives a single polynomial constraint.

In case I, with x3 = 0, x = x1 = x2, we find

Formula 20
which has a single positive root

Formula 20
and a zero at x = 0.

In case II, with x = x1 = x2 = x3, we find

Formula 20
which factorizes into a polynomial of degree 5, and x(1 – x)2 which has zeros at x = 0, 1.

In case III, with x = x3, x1 = x2 = 1, L = 0 (unless f1 = f2 = f4 = 0, whence L is undefined).


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Appendix
 Acknowledgements
 References
 
Here we report experimental results obtained by employing our method on genomic sequences. We conclude by discussion and pointing out future research directions.

Results on Genomic Sequences
In order to evaluate our method, we tested it on three sets of real genomic sequences. In each case, we found a single solution to P0 = 0 in the realistic parameter space, for each of the three rooted triples. By evaluating the likelihood at each of these turning points, we found each was a maximum.

We looked at the natural killer cell receptor D gene on human, mouse, and rat (accession numbers AF260135, AF030313 and AF009511, respectively). We aligned the sequences using ClustalW (Thompson, Higgins, and Gibson 1994Go). Next, we computed the observed sequence spectrum F (eq. 21, as explained in Obtaining the Maximum Likelihood Solution).

Formula 21(21)
We calculated the ML value for each of the three rooted trees under the model for the three species. The (rat, mouse, and human) tree was maximal, with edge lengths q1 = q2 = 0.0197 to rat and mouse and q3 = 0.1061 to human, giving the log likelihood ln L = –870.2. This result suggests the human-rodent split at 70 MYA and the mouse-rat split at 20 MYA, a result consistent with commonly accepted dates.

We also calculated the ML value for each of the three rooted trees for the beta actin gene, for the three species guinea pig, goose, and Caenorhabditis elegans (accession numbers AF508792, M26111, and NM_076440, respectively), finding the ((guinea pig, goose), C. elegans) tree maximal, with q1 = q2 = 0.021819 and q3 = 0.050188 giving ln L = –1241.5. Finally, we calculated the ML value for each of the three rooted trees for the histone gene of Drosophila melangoster, Hydra vulgaris, and human (accession numbers AY383571, AY383572, and NM_002107, respectively), finding the ((D. melangoster, H. vulgaris), Human) tree maximal, with q1 = q2 = 0.001555 and q3 = 0.012740 with ln L = –86.835133.

Each of the results above agree closely with the numerical values obtained using the popular phylogenetic reconstruction packages PHYLIP (Felsenstein 1995Go) and PAUP* (Swofford 1998Go), which use iterative methods to estimate the maxima. In each case, the likelihood function had a unique maximum in the parameter range.

Directions for Future Research
The progress made here brings up a number of open problems:

Our ML solutions are derived from the roots of a univariate, degree 11 polynomial. This implies that the number of ML solutions is finite. It would be interesting to explore the question of "uniqueness" of the solution. If this is the case, it will most likely follow from the existence of a single solution corresponding to a realistic tree, as in Chor, Khetan, and Snir (2003)Go.
The Jukes-Cantor substitution model is a special case of the family of Kimura substitution models. It would be interesting to further extend the result in this paper for the other models (two and three parameters) of the Kimura family.
It would be interesting to extend these results to rooted trees with "four leaves" under Jukes-Cantor model and a molecular clock. Here we have two different topologies—the fork and the comb (Chor, Khetan, and Snir 2003Go). It is expected that such extension will face substantial technical difficulties.


    Appendix
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Appendix
 Acknowledgements
 References
 
The polynomial P0 is presented where the coefficients are functions of the observed pattern frequencies, fi with f12 = f1 + f2. Using Maple we find:


Formula


Formula


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Appendix
 Acknowledgements
 References
 
Thanks to Joseph Felsenstein for his fruitful discussions, to Bernd Sturmfels for enlightening comments on this manuscript and informing us about his manuscript (Hosten, Khetan, and Sturmfels 2004), and to the two other reviewers for their constructive comments. This research was supported by the Israel Science Foundation grant 418/00.


    Footnotes
 
William Martin, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Appendix
 Acknowledgements
 References
 

    Aho, A. V., Y. Sagiv, T. G. Szymanski, and J. D. Ullman. 1981. Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J. Comput. 10:405–421.[CrossRef]

    Catanese, F., S. Hosten, A. Khetan, and B. Sturmfels. 2004. The maximum likelihood degree. (http://front.math.ucdavis.edu/math.AG/0406533).

    Chor, B., M. Hendy, B. Holland, and D. Penny. 2000. Multiple maxima of likelihood in phylogenetic trees: an analytic approach. Mol. Biol. Evol. 17:1529–1541. (Earlier version appeared in RECOMB 2000).[Abstract/Free Full Text]

    Chor, B., M. Hendy, and D. Penny. 2001. Analytic solutions for three taxon mlmc trees with variable rates across sites. Pp. 204–213 in B. Moret and O. Gascuel, eds. Lecture notes in Computer Science 2149 in Workshop on Algorithms in BioInformatics 2001. Springer-Verlag, Heidelberg, Germany.

    Chor, B., A. Khetan, and S. Snir. 2003. Maximum likelihood on four taxa phylogenetic trees: analytic solutions. Pp. 76–83 in W. Miller, M. Vingron, S. Istrail, P. Pevzner, and M. Waterman, eds. Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB). AMC Press, Berlin.

    Chor, B., and S. Snir. 2004. Molecular clock fork phylogenies: closed form analytic maximum likelihood solutions. Syst. Biol. 53(6):963–967.[CrossRef][ISI][Medline]

    Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368–376.[CrossRef][ISI][Medline]

    ———. 1995. PHYLIP (phylogeny inference package). Distributed by the author, Department of Genetics, University of Washington, Seattle.

    Hendy, M. D., and D. Penny. 1993. Spectral analysis of phylogenetic data. J. Classif. 10:5–24.

    Hendy, M. D., D. Penny, and M. A. Steel. 1994. Discrete Fourier analysis for evolutionary trees. Proc. Natl. Acad. Sci. USA 91:3339–3343.[Abstract/Free Full Text]

    Hendy, M. D., and S. Snir. 2005. Hadamard conjugation for the Kimura 3st model: combinatorial proof using pathsets. (http://arxiv.org/abs/q-bio.PE/0505055).

    Hosten, S., A. Khetan, and B. Sturmfels. 2004. Solving the likelihood equations. (http://front.math.ucdavis.edu/math.ST/0408270).

    Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21–132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York.

    Kimura, M. 1981. Estimation of evolutionary sequences between homologous nucleotide sequences. Proc. Natl. Acad. Sci. USA 78:454–458.[Abstract/Free Full Text]

    Neyman, J. 1971. Molecular studies of evolution: a source of novel statistical problems. Pp. 1–27 in S. Gupta and Y. Jackel, eds. Statistical decision theory and related topics. Academic Press, New York.

    Steel, M., M. D. Hendy, and D. Penny. 1998. Reconstructing phylogenies from nucleotide pattern probabilities: a survey and some new results. Discrete Appl. Math. 88:367–396.

    Steel, M. A., M. D. Hendy, L. A. Székely, and P. L. Erdös. 1992. Spectral analysis and a closest tree method for genetic sequences. Appl. Math. Lett. 5:63–67.

    Sturmfels, B., and S. Sullivant. 2005. Toric ideals of phylogenetic invariants. J. Comput. Biol. 12:204–228.[CrossRef][ISI][Medline]

    Swofford, D. L. 1998. PAUP*beta. Sinauer Associates, Sunderland, Mass.

    Székely, L., P. L. Erdös, M. A. Steel, and D. Penny. 1993. A Fourier inversion formula for evolutionary trees. Appl. Math. Lett. 6:13–17.

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalty and weight matrix choice. Nucleic Acids Res. 22:4673–4780.[Abstract/Free Full Text]

    Yang, Z. 2000. Complexity of the simplest phylogenetic estimation problem. Proc. R. Soc. Lond. B 267:109–116.[Medline]

Accepted for publication November 21, 2005.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/3/626    most recent
msj069v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chor, B.
Right arrow Articles by Snir, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chor, B.
Right arrow Articles by Snir, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?