Skip Navigation


MBE Advance Access originally published online on February 13, 2007
Molecular Biology and Evolution 2007 24(4):1075-1079; doi:10.1093/molbev/msm028
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/4/1075    most recent
msm028v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Steel, M.
Right arrow Articles by Matsen, F. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Steel, M.
Right arrow Articles by Matsen, F. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

The Bayesian "Star Paradox" Persists for Long Finite Sequences

Mike Steel1 and Frederick A. Matsen

Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand

E-mail: m.steel{at}math.canterbury.ac.nz.


    Abstract
 TOP
 Abstract
 Introduction
 Analysis of the Star...
 Concluding Remarks
 Appendix: Proof of Lemma...
 Acknowledgements
 References
 
The "star paradox" in phylogenetics is the tendency for a particular resolved tree to be sometimes strongly supported even when the data is generated by an unresolved ("star") tree. There have been contrary claims as to whether this phenomenon persists when very long sequences are considered. This note settles one aspect of this debate by proving mathematically that the chance that a resolved tree could be strongly supported stays above some strictly positive number, even as the length of the sequences becomes very large.

Key Words: phylogenetic trees • Bayesian statistics • star trees


    Introduction
 TOP
 Abstract
 Introduction
 Analysis of the Star...
 Concluding Remarks
 Appendix: Proof of Lemma...
 Acknowledgements
 References
 
Two recent papers (Yang and Rannala 2005Go; Lewis et al. 2005Go) highlighted a phenomenon that occurs when sequences evolve on a tree that contains a polytomy—in particular a 3-taxon unresolved rooted tree. As longer sequences are analyzed using a Bayesian approach, the posterior probability of the trees that give the different resolutions of the polytomy do not converge on relatively equal probabilities—rather a given resolution can sometimes have a posterior probability close to one. This has been called the "star paradox" because the data evolved on an unresolved tree, and thus a high posterior for a particular resolved tree must be artifactual. In response Kolaczkowski and Thornton (2006)Go investigated this phenomenon further, providing some interesting simulation results and offering an argument that seems to suggest that for very long sequences the tendency to sometimes infer strongly supported resolutions suggested by the earlier papers would disappear with sufficiently long sequences. As part of their case, the authors use the expected site frequency patterns to simulate the case of infinite length sequences, concluding that "with infinite length data, posterior probabilities give equal support for all resolved trees, and the rate of false inferences falls to zero." Of course these findings concern sequences that are effectively infinite, and, as is well known in statistics, the limit of a function of random variables (in this case site pattern frequencies for the first n sites) does not necessarily equate with the function of the limit of the random variables. Accordingly Kolaczkowski and Thornton offer this appropriate cautionary qualification of their findings:

"Analysis of ideal data sets does not indicate what will happen when very large data sets with some stochastic error are analyzed, but it does show that when infinite data are generated on a star tree, posterior probabilities are predictable, equally supporting each possible resolved tree."

Yang and Rannala (2005)Go had attempted to simulate the large sample posterior distribution but ran into numerical problems and commented that it was "unclear" what the limiting distribution on posterior probabilities was as n became large.

In particular, all of the aforementioned papers have left open an interesting statistical question, which this short note formally answers—namely, does the Bayesian posterior probability of the 3 resolutions of a star tree on 3 taxa converge to 1/3 as the sequence length tends to infinity? That is, does the distribution on posterior probabilities for "very long sequences" converge on the distribution for infinite length sequences? We show that for most reasonable priors it does not. Thus the star paradox does not disappear as the sequences get longer.

As noted by Yang and Rannala (2005)Go and Lewis et al. (2005)Go, one can demonstrate such phenomena more easily for related simpler processes such as coin tossing (particularly if one imposes a particular prior). Here, we avoid this simplification as such results do not rigorously establish corresponding phenomena in the phylogenetic setting, which in contrast to coin tossing involves considering a parameter space of dimension greater than 1 (moreover, as we will see there is a complication that arises in the phylogenetic problem that is entirely absent from the coin-tossing problem). We also frame our main result so that it applies to a fairly general class of priors. Whether or not the star paradox is a practical concern for biologists is likely to depend heavily on the data, the priors, and the methods used to establish Bayesian posterior probabilities. The purpose of this paper is merely to indicate that from a mathematical perspective there is no reason to think that the star paradox will automatically vanish given long enough sequences. Some further comments and earlier references on the phenomenon have been described in the recent review paper by Alfaro and Holder (2006Go, p. 35–36).


    Analysis of the Star Tree Paradox for 3 Taxa
 TOP
 Abstract
 Introduction
 Analysis of the Star...
 Concluding Remarks
 Appendix: Proof of Lemma...
 Acknowledgements
 References
 
On tree T1 (in fig. 1), let pi = pi(t0, t1), i = 0, 1, 2, 3, denote the probabilities of the 4 site patterns (xxx, xxy, yxx, xyx, respectively) under the simple 2-state symmetric Markov process (the argument extends to more general models, but it suffices to demonstrate the phenomena for this simple model). From equation (2) of Yang and Rannala (2005)Go we have

Formula

Formula
and

Formula
It follows by elementary algebra that for i = 2, 3,

Formula (1)
and thus p1(t0, t1) ≥ pi(t0, t1) with strict inequality unless t0 = 0 (or in the limit as t1 tends to infinity).


Figure 1
View larger version (5K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— The 3 resolved rooted phylogenetic trees on 3 taxa T1, T2, T3, and the unresolved ‘star’ tree on which the sequences are generated T0.

 
To allow maximal generality, we make only minimal assumptions about the prior distribution on trees and branch lengths. Namely, we assume that the 3 resolved trees on 3 leaves (trees T1, T2, T3 in fig. 1) have equal prior probability Formula and that the prior distribution on branch lengths t0, t1 is the same for each tree with a smooth joint probability density function that is bounded and everywhere nonzero. This condition applies, for example, to the exponential prior discussed by Yang and Rannala (2005)Go. Any prior that satisfies these properties we call "tame."

Let n = (n0, n1, n2, n3) be the counts of the different types of site patterns (corresponding to the same patterns as for the Pi's). Thus n = {sum}Formula ni is the total number of sites (i.e., the length of the sequences). Given a prior distribution on (t0, t1) for the branch lengths of Ti (for i = 1, 2, 3) let Formula be the posterior probability of tree Ti given the site pattern counts n. Now suppose the n sites are generated on a star tree T0 with positive branch lengths. We are interested in whether the posterior probability Formula could be close to 1 or whether the chance of generating data with this property goes to zero as the sequence length gets very large. We show that in fact the latter possibility is ruled out by our main result, namely, the following:

Theorem 2.1
Consider sequences of length n generated by a star tree T0 on 3 taxa with strictly positive edge length tFormula and let n be the resulting data (in terms of site pattern counts). Consider any prior on the 3 resolved trees (T1, T2, T3) and their branch lengths that is tame (as defined above). For any {epsilon} > 0, and each resolved tree Ti (i = 1, 2, 3), the probability that n has the property that

Formula
does not converge to 0 as n tends to infinity.

Proof of Theorem 2.1
Because of the considerable details involved in the proof, we present a brief, intuitive outline of the argument.

Firstly, we will show that the probability that a star tree generates a "moderate" excess of site patterns favoring any 1 of the 3 resolved trees stays bounded above zero as n (the sequence length) goes to infinity. Here, by moderate we mean that the excess is in the order of the square root of n (the standard deviation for site pattern counts). Conditioning on this event (and some related moderate events, collectively called Fc below), we would like to show that a given tree T has unusually high posterior probability for this data; this can be simplified to showing that the ratio of the posterior probability of the given resolved tree to either of the other 2 resolved trees is large. These ratios can in turn be conveniently expressed as a ratio of expectations of 2 closely related random variables (X, Yeq. 4). A crucial observation (based on symmetry considerations) is that X and Y are random variables determined just by T and the prior on its branch lengths (t0, t1)—that is, we have reduced a problem involving 3 resolved trees and their branch length priors to an analysis of 2 quantities involving a single resolved tree and its branch length prior. To analyze the ratio of expectations (in the hope of showing it is large), it helps to focus on the distribution of the random variables (P0, P1) induced by the prior on (t0, t1) (later it is helpful to consider a further derived pair of random variables (P0, Z)). We do not need to calculate these distributions explicitly; indeed, working over the general class of priors is curiously helpful here, as it avoids the temptation to get bogged down in the detailed analysis of a particular distribution.

Considering the ratio of conditional expectations according to a distribution on (P0, P1) leads to a second helpful observation: if we condition on P0 taking a particular value, say p0, then p0 cancels out of the ratio (eq. 13), reducing what began as a 2-dimensional problem to a family of 1-dimensional computations. At this point a complication arises that requires some care to resolve (and which does not arise in the much simpler [fair] coin-tossing problem). Namely, the ratio of conditional probabilities is not always large (e.g., if P0 is conditioned to equal a value that approaches Formula which corresponds to long branches [site saturation], then all 3 resolved trees have approximately the same posterior probability for any data). Nevertheless, we are assuming that the data is generated by a star tree with finite nonzero branch lengths and so the probability q0 of the unvaried pattern (xxx) is a fixed number strictly between Formula and 1. If P0 were to differ from q0 too much then (for long sequences that satisfy the moderate events mentioned above) the posterior probability of any resolved tree conditional on such an extreme P0 value would be small, compared with a conditioned value of P0 close to q0—this is, informally, what Claim (i) in the proof below says. Moreover, if P0 is conditioned to equal a value close to q0, then the ratio of conditional expectations can be shown to be large. This is essentially Claim (ii) in the proof below. These 2 claims can be combined by Lemma 2.2 to handle the complication described, and thereby, establish what we required, namely, that the ratio of (unconditional) expectations is large.

We now proceed to the formal details—the proofs of Claims (i) and (ii) and Lemma 2.2 are deferred to the Appendix.

Consider the star tree T0 with given branch lengths tFormula (as in fig. 1). Let (q0, q1, q2, q3) denote the probability of the 2 types of site patterns generated by T0 with these branch lengths. Note that q1 = q2 = q3 and so q0 = 1 – 3q1). Suppose we generate n sites on this tree, and let n0, n1, n2, n3 be the counts of the different types of site patterns (corresponding to the pi's). Formula and for i = 1, 2, 3, let

Formula
For a constant c > 1, let Fc denote the event:

Formula
Notice that Fc implies {Delta}1 isin [2c, 4c] because {Delta}1 + {Delta}2 + {Delta}3 = 0. By standard stochastic arguments (based on the asymptotic approximation of the multinomial distribution by the multinormal distribution) event Fc has probability at least some value {delta}' = {delta}'(c) > 0 for all n sufficiently large (relative to c).

Given the data n = (n0, n1, n2, n3) write Formula for the probability of n assuming the data was generated on tree Ti with branch lengths t0, t1. The assumption of equality of priors across T1, T2, and T3 implies that

Formula (2)
and

Formula (3)

Now, as (t0, t1) are random variables with some prior density, when we view p0, p1, p2, p3 as random variables by virtue of their dependence on (t0, t1), we will write them as P0, P1, P2, P3 (note that Yang and Rannala [2005]Go use Pi differently). With this notation, the posterior probability of T1 conditional on n can be written as

Formula
where p(n) is the posterior probability of n (assuming that data is generated on one of the resolved trees chosen with equal probability) and Formula denotes expectation with respect to the prior for t0, t1 on T1. Moreover, because P2 = P3, we can write this as Formula By equation (2) and equation (3), we have

Formula
where again expectation is taken with respect to the prior for t0, t1 on T1. Consequently,

Formula (4)
where

Formula
As will be shown later, it suffices to demonstrate that the ratio in equation (4) can be large with nonvanishing probability in order to obtain the conclusion of the theorem. In order to do so, we use the following lemma, whose proof is provided in the Appendix.

Lemma 2.2
Let X, Y be nonnegative continuous random variables, dependent on a third random variable {Lambda} that takes values in an interval I = [a, b]. Suppose that for some interval I0 strictly inside I, and I1 = II0 the following inequality holds:

Formula (5)
and that for some constant B > 0, and all {lambda} isin I0,

Formula (6)
Then, Formula

To apply this lemma, select a value s > 0 so that Formula Then let Formula

Claim:
Let c > 1. Then conditional on the data n = (n0, n1, n2, n3) satisfying Fc, and for n sufficiently large, the following two inequalities hold:
(i) Formula
(ii) For all Formula
The proofs of these claims are given in the Appendix.

To apply these claims, select Formula (this is finite by the assumption that the prior on (t0, t1) is smooth and everywhere nonzero). Then, Formula

Combining Lemma 2.2 with Claims (i) and (ii) for this value of c we deduce that conditional on n satisfying Fc and for n sufficiently large,

Formula (7)
As stated before, the probability that n satisfies Fc is at least {delta}' = {delta}'(c) > 0 for n sufficiently large, and so, by equation (4) and equation (7), we have Formula Similarly, Formula Now, because Formula it follows that, for n sufficiently large, and conditional an event of probability at least {delta}' > 0, that Formula as claimed. This completes the proof. {square}


    Concluding Remarks
 TOP
 Abstract
 Introduction
 Analysis of the Star...
 Concluding Remarks
 Appendix: Proof of Lemma...
 Acknowledgements
 References
 
One feature of the argument we have provided is that it does not require stipulating in advance a particular prior on the branch lengths—that is, the result is somewhat generic as it imposes relatively few conditions. Moreover, it seems possible to weaken these even further. For example, the requirement that the prior on (t0, t1) be everywhere nonzero could be weakened to simply being nonzero in a neighborhood of (0, tFormula) (thereby allowing, e.g., a uniform distribution on bounded range).

An interesting open question in the spirit of this paper is to explicitly calculate the limit of the posterior density f(P1, P2, P3) described in (Yang and Rannala 2005Go). It may also be of interest to study posterior support for resolved trees when one weakens the molecular clock assumption on the star tree that generates the data. For example, one could imagine combinations of (non-clocklike) branch lengths that may lead to more frequent and/or stronger support for particular resolved trees.


    Appendix: Proof of Lemma 2.2 and Claims (i) and (ii)
 TOP
 Abstract
 Introduction
 Analysis of the Star...
 Concluding Remarks
 Appendix: Proof of Lemma...
 Acknowledgements
 References
 

Proof of Lemma 2.2:
For W = X, Y we have

Formula (8)
In particular, for W = X we have: Formula Note that equation (6) implies that Formula so

Formula (9)
Taking W = Y in equation (8) and applying equation (5) gives us

Formula
which combined with equation (9) gives the result. {square}

Proof of Claim (i), Formula

We will first bound Formula above. Let µ (n) = (qFormulaqFormulaqFormulaqFormula)n. Now, conditional on n satisfying Fc we have

Formula (10)
where p = (p0, p1, p2, p3) and q = (q0, q1, q2, q3), and dKL denotes Kullback–Leibler distance. Now, Formula (the first inequality is a standard one in probability theory). In particular, if p0 isin I1, then |q0 – p0|>s>0. Moreover,

Formula
The right hand side can then be bounded above by rearranging equation (10) and using the lower bound on the Kullback–Leibler distance, giving

Formula (11)
In the reverse direction, we have

Formula
where

Formula
and

Formula

Now,

Formula (12)
for (p0, p1, p2) determined by some Formula Now, the first term of the product in equation (12) converges to a constant as n grows (because p0q0, p1 q1, and p2q1 are each of order n–1), whereas the condition Fc ensures that the second term decays no faster than Formula for a constant C1. Thus, Formula for a positive constant C2. The term B(n) is asymptotically proportional to n–2. Summarizing, for a constant C3 > 0 (dependent just on tFormula)

Formula
which combined with equation (11) establishes Claim (i) for n sufficiently large. {square}

In order to prove Claim (ii), we need some preliminary results.

Lemma 3.1
Let {eta} < 1. Then for each x > 0 there exists a value K = K(x) < {infty} that depends continuously on x so that the following holds. For any continuous random variable Z on [0, 1] with a smooth density function f that satisfies f(1) != 0 and |f'(z)| < B for all z isin ({eta}, 1], we have

Formula
for all Formula

Proof.
Let Formula Then,

Formula
Now,

Formula
where ~ denotes asymptotic equivalence (i.e., f (k) ~ g(k) if limk->{infty}f (k) / g (k) = 1). Using integration by parts,

Formula
Now, provided k > (1 – {eta})–2, we have tk > {eta} and so the absolute value of the second term on the right is at most Formula Consequently, Formula is bounded above by B times a term of order k–2. A similar argument, again using integration by parts, shows that Formula is bounded above by B times a term of order k–2, and the lemma now follows by some routine analysis. {square}

Lemma 3.2
Let y = (1 + 2x)(1 – x)2. Then, for x isin [0, 1) and m ≥ 3 we have

Formula

Proof.

Formula
and m2(1 – y) = m2(3x2 – 2x3) ≤ 3m2x2, and for m ≥ 3 this upper bound is less or equal to the lower bound in the previous expression. {square}

Proof of Claim (ii), for all p0 isin I0, Formula :

Write Formula as shorthand for Formula Note that, for any r, s > 0, Formula Consequently, if we let Formula then, by definition of the {Delta}i's,

Formula (13)

Now, conditional on n satisfying Fc (and because P1 ≥ P2) the following 2 inequalities hold (recalling that {Delta}1 + {Delta}2 + {Delta}3 = 0),

Formula
Applying this to equation (13) gives:

Formula (14)

Let U = (P1 – P2) / (1 – P0), which takes values between 0 and 1 because P1 ≥ P2. Because P1 + 2P2 = 1 – P0, we can write Formula

Thus, Formula and Formula Substituting these into equation (14), letting Z = (1 + 2U)(1 – U)2 and noting that Formula gives

Formula
Thus, by Lemma 3.2, (taking Formula ) we obtain, for m ≥ 3,

Formula (15)
The mapping (t0, t1) ↦ (P0, Z) is a smooth invertible mapping between (0, {infty})2 and its image within (Formula, 1) x (0, 1). Notice that Z becomes concentrated about 1 whenever P0 approaches Formula. However, for p0 in the interval I0 the conditional density f(Z|P0 = p0) is smooth, bounded away from 0, and its first derivative is also uniformly bounded above over this interval. Consequently, we may apply Lemma 3.1 (noting that the condition that n satisfies Fc ensures that Formula to show that for n sufficiently large the following inequality holds for all p0 isin I0,

Formula
Applying this to equation (15) gives Formula as claimed. This completes the proof of Claim (ii).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Analysis of the Star...
 Concluding Remarks
 Appendix: Proof of Lemma...
 Acknowledgements
 References
 
M.S. thanks Ziheng Yang for suggesting the problem of computing the limiting distribution of posterior probabilities for 3-taxon trees. We also thank Paul O. Lewis and the anonymous referee for their helpful comments. This work is funded by the Allan Wilson Centre for Molecular Ecology and Evolution.


    Footnotes
 
1 Present address: Biomathematics Research Centre, Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand Back

Herve Philippe, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Analysis of the Star...
 Concluding Remarks
 Appendix: Proof of Lemma...
 Acknowledgements
 References
 

    Alfaro ME and Holder MT. (2006) The posterior and prior in Bayesian phylogenetics. Annu Rev Evol Syst 37:19–42.

    Kolaczkowski B and Thornton JW. (2006) Is there a star tree paradox? Mol Biol Evol 23:1819–1823.[Abstract/Free Full Text]

    Lewis PO, Holder MT, Holsinger KE. (2005) Polytomies and Bayesian phylogenetic inference. Syst Biol 54:2241–253.[CrossRef][ISI][Medline]

    Yang Z and Rannala B. (2005) Branch-length prior influences Bayesian posterior probability of phylogeny. Syst Biol 54:3455–470.[CrossRef][ISI][Medline]

Accepted for publication January 31, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
L. Shavit, D. Penny, M. D. Hendy, and B. R. Holland
The Problem of Rooting Rapid Radiations
Mol. Biol. Evol., November 1, 2007; 24(11): 2400 - 2411.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
Z. Yang
Fair-Balance Paradox, Star-tree Paradox, and Bayesian Phylogenetics
Mol. Biol. Evol., August 1, 2007; 24(8): 1639 - 1655.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/4/1075    most recent
msm028v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Steel, M.
Right arrow Articles by Matsen, F. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Steel, M.
Right arrow Articles by Matsen, F. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?