Skip Navigation


MBE Advance Access originally published online on May 7, 2007
Molecular Biology and Evolution 2007 24(8):1639-1655; doi:10.1093/molbev/msm081
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/8/1639    most recent
msm081v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yang, Z.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yang, Z.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Fair-Balance Paradox, Star-tree Paradox, and Bayesian Phylogenetics

Ziheng Yang

Department of Biology, Galton Laboratory, University College London, London, United Kingdom

E-mail: z.yang{at}ucl.ac.uk.


    Abstract
 TOP
 Abstract
 Introduction
 Biological Synopsis
 Discussion
 Mathematical Analysis
 Addendum
 Appendix. Derivatives for...
 Acknowledgements
 References
 
The star-tree paradox refers to the conjecture that the posterior probabilities for the three unrooted trees for four species (or the three rooted trees for three species if the molecular clock is assumed) do not approach Formula when the data are generated using the star tree and when the amount of data approaches infinity. It reflects the more general phenomenon of high and presumably spurious posterior probabilities for trees or clades produced by the Bayesian method of phylogenetic reconstruction, and it is perceived to be a manifestation of the deeper problem of the extreme sensitivity of Bayesian model selection to the prior on parameters. Analysis of the star-tree paradox has been hampered by the intractability of the integrals involved. In this article, I use Laplacian expansion to approximate the posterior probabilities for the three rooted trees for three species using binary characters evolving at a constant rate. The approximation enables calculation of posterior tree probabilities for arbitrarily large data sets. Both theoretical analysis of the analogous fair-coin and fair-balance problems and computer simulation for the tree problem confirmed the existence of the star-tree paradox. When the data size n -> {infty}, the posterior tree probabilities do not converge to Formula each, but they vary among data sets according to a statistical distribution. This distribution is characterized. Two strategies for resolving the star-tree paradox are explored: (1) a nonzero prior probability for the degenerate star tree and (2) an increasingly informative prior forcing the internal branch length toward zero. Both appear to be effective in resolving the paradox, but the latter is simpler to implement. The posterior tree probabilities are found to be very sensitive to the prior.

Key Words: Lindley's paradox • fair-balance paradox • star-tree paradox • prior • clade probabilities


    Introduction
 TOP
 Abstract
 Introduction
 Biological Synopsis
 Discussion
 Mathematical Analysis
 Addendum
 Appendix. Derivatives for...
 Acknowledgements
 References
 
Thanks to the implementation of efficient Markov chain Monte Carlo (MCMC) algorithms in the computer program MrBayes (Huelsenbeck and Ronquist 2001Go), the Bayesian method of phylogeny reconstruction (Rannala and Yang 1996Go; Yang and Rannala 1997Go; Mau and Newton 1997Go; Li, Pearl, and Doss 2000Go) has gained popularity and is now widely used in analysis of molecular data sets. One concern raised about the method is that it often produces extremely high posterior probabilities for trees or clades (Suzuki, Glazko, and Nei 2002Go; Cummings et al. 2003Go; Douady et al. 2003Go; Erixon et al. 2003Go; Simmons, Pickett, and Miya. 2004). For example, Rannala and Yang (1996)Go calculated the posterior probability for a tree of five ape species using 11 mitochondrial tRNA genes to be 0.9999. Even though the tree is sensible, the posterior probability is very high given that the human-chimpanzee-gorilla relationship was hard to resolve and that the data set, with 759 bp, is small. Similarly use of MrBayes in real data analysis has produced high posterior probabilities, often mostly 100%. Sometimes different data sets (such as different genes or different taxon samples) produced contradictory phylogenies, each with strong posterior support (e.g., Bourlat et al. 2006Go). Simulation studies and empirical data analyses have repeatedly found that the posterior tree probabilities tend to be much higher than bootstrap support values (e.g., Cummings et al. 2003Go; Douady et al. 2003Go; Erixon et al. 2003Go; Simmons, Pickett, and Miya 2004Go). This discrepancy in itself may not suggest anything inappropriate about posterior probabilities, because the interpretation of bootstrap support values is uncertain (e.g., Berry and Gascuel 1996Go; Yang and Rannala 2005Go). Nevertheless, there is widespread concern that posterior probabilities for trees or clades calculated from many data sets may be too high.

In a simulation study, Suzuki, Glazko, and Nei (2002)Go generated data sets under the star tree for four species and analyzed them using MrBayes, which considers binary trees only. They found that the posterior probability for the inferred binary tree was often too high. The study used a wrong and simplistic model in the analysis, so that the problem was due in part to model violation. However, extreme posterior probabilities were observed in similar simulations without model violation (Cummings et al. 2003Go; Lewis, Holder, and Holsinger 2005Go; Yang and Rannala 2005Go). The failure of the posterior probabilities for the three binary trees to converge to Formula in large data sets simulated under the star tree is somewhat counterintuitive and is called the star-tree paradox (Lewis, Holder, and Holsinger 2005Go). The concern is not so much that the posterior tree probabilities differ from Formula as that they are sometimes either very small or very large when in fact no information is available to resolve the tree one way or another.

The posterior probability for a tree is the probability that the tree is true given the data, the prior, and the likelihood (substitution) model. There are thus three possible reasons for high tree probabilities: (1) errors, including numerical problems in the MCMC algorithm, which cause the posterior probabilities to be calculated incorrectly; (2) misspecification of the substitution model; and (3) misspecification and sensitivity of the prior. The first two reasons may be responsible for high posterior probabilities in some studies. In particular, use of simplistic and unrealistic models is known to inflate posterior probabilities for trees (e.g., Buckley 2002Go; Lemmon and Moriarty 2004Go; Huelsenbeck and Rannala 2004Go). However, high posterior probabilities have also been observed when the first two reasons clearly do not apply (Yang and Rannala 2005Go). This article deals with the third reason and studies the effect of prior specification on Bayesian phylogenetic inference.

The nature of the problem may be better understood by considering the analogous fair-coin problem (Lewis, Holder, and Holsinger 2005Go; Yang and Rannala 2005Go). Suppose a coin is fair with the probability of heads to be {theta}0 = Formula. We flip the coin n times and observe y heads. We then calculate the posterior probabilities (P and P+) for two models that the coin is either negatively or positively biased: H: {theta} < Formula and H+: {theta} > Formula. (It is inconsequential whether the true value {theta} = Formula is included in none, one, or both of the two models since a point value has zero probability in a continuous distribution.) We assign equal prior probabilities for H and H+ and uniform priors for {theta} in each model. When n is large, we may expect P and P+ to approach Formula, but they do not. Instead P varies considerably among data sets (all generated under {theta}0 = Formula) even when n -> {infty}. This is referred to as the fair-coin paradox (Lewis, Holder, and Holsinger 2005Go). Indeed, the limiting distribution of P when n -> {infty} is the uniform U(0, 1) (Yang and Rannala 2005Go, equation 5). Figure 1 shows the histograms of P when n = 103 and 106. Intuitively, even though the proportion of heads y/n becomes closer and closer to Formula when n increases, the number of heads y fluctuates around n/2 more and more wildly among data sets. Note that the variance of y/n is 1/(4n), and the variance of y is n/4. The posterior probability P depends on the number as well as the proportion of heads.


Figure 1
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— The histogram of P, the posterior probability that the coin has negative bias (with the probability of heads {theta} < Formula) in a coin-tossing experiment. A fair coin is tossed n = 103({circ}) or n = 106(•) times. The number of heads y in n tosses is used to calculate P, assuming a uniform prior {theta}~ U(0, 1), and the proportion of replicate data sets in which P falls into bins of 2% width is calculated to form the histogram. The number of simulated replicates is 105. The fluctuation for n = 103 is mainly due to the discrete nature of the data; for example, in no data sets is P in the 0.50–0.52 bin because P = 0.5 if y = 500 and P = 0.525 if y = 499. When n = 106, the fluctuation disappears and P has nearly a U(0, 1) distribution, by which the proportion in each bin is 0.02.

 
One has to consider how a sensible Bayesian analysis should behave in this problem. In a significance test, the P value has a uniform distribution U(0, 1) if the null hypothesis is true and the test is exact. The true null hypothesis is falsely rejected 5% of the time if the test is conducted at the 5% significance level. This is the case even with infinitely large data sets, if a fixed significance level is used. However, Bayesian statistics is a more "optimistic" and "aggressive" methodology (Efron 1998Go). In Bayesian model selection, the posterior probability for the true model, or the model closest to the truth among the compared models, should converge to one when the amount of data approaches infinity. As H and H+ are equally distant from the truth {theta}0 = Formula, one may sensibly expect P and P+ to converge to Formula when n -> {infty}. Of course, P should converge to 1 if {theta}0 < Formula (or to 0 if {theta}0 > Formula). For the tree problem, the same argument suggests that if the true tree is the star tree, one would like the posterior probabilities for the three binary trees to converge to Formula each when the number of sites n -> {infty}. Here I take this position, as did Lewis, Holder, and Holsinger (2005)Go and Yang and Rannala (2005)Go. It has been unclear how posterior tree probabilities behave in very large data sets or when n -> {infty}, because problems of phylogeny reconstruction are intractable analytically. Numerical calculation of integrals becomes unreliable in large data sets while MCMC algorithms are too slow and too imprecise.

In this article I develop approximate methods to calculate the posterior probabilities (P1, P2, P3) for the three rooted trees for three species, using data of binary characters evolving at a constant rate. This is the simplest tree-reconstruction problem (Yang 2000Go), chosen here to make the analysis possible. The approximation allows Bayesian calculation in arbitrarily large data sets, without the need for MCMC algorithms. I conduct large-scale simulations, which confirm the existence of the star-tree paradox; when the data size n increases, the posterior tree probabilities do not converge to Formula each, but continue to vary among data sets according to a statistical distribution. This distribution is characterized. I then explore the sensitivity of Bayesian analysis to the prior and evaluate two strategies suggested to resolve the star-tree paradox. The first assigns a nonzero prior probability for the degenerate star tree (Lewis, Holder, and Holsinger 2005Go), and the second uses a prior to force the internal branch lengths to approach zero when n -> {infty} (Yang and Rannala 2005Go). The behavior of posterior tree probabilities in large data sets is predicted by drawing an analogy with the fair-coin problem, and the predictions are confirmed numerically by computer simulation.

A synopsis is provided in the next section, which summarizes the major results of this study. The biologist reader may read this section, as well as the Discussion, and skip the Mathematical Analysis section.


    Biological Synopsis
 TOP
 Abstract
 Introduction
 Biological Synopsis
 Discussion
 Mathematical Analysis
 Addendum
 Appendix. Derivatives for...
 Acknowledgements
 References
 
The Fair-coin and Fair-balance Problems
The fair-coin problem, as described above, has the same behavior as the fair-balance problem discussed by Yang and Rannala (2005)Go, and in this study their results are treated interchangeably. Here the results are summarized for the fair-coin problem. We assign a beta prior on the probability of heads: {theta} ~ beta({alpha}, {alpha}), with mean Formula and variance 1/(8{alpha} + 4). This is the U(0, 1) prior when {alpha} = 1 but can be highly concentrated around Formula if {alpha} is large. As long as {alpha} is fixed, the posterior probability P for the model of negative bias approaches the uniform distribution U(0, 1) when the number of coin tosses n -> {infty}.

Two strategies (priors) are considered to resolve the fair-coin paradox. In the first, {alpha} in the beta prior increases with n so that the prior variance of {theta} approaches 0, forcing {theta} to be more and more highly concentrated around Formula. We require that P approach Formula if the coin is fair, and 1 if the coin has a negative bias (or 0 if the coin has a positive bias). These requirements mean that the prior variance for {theta} should approach 0 faster than 1/n and more slowly than 1/n2. In the second, a nonzero prior probability is assigned to the degenerate model of no bias H0: {theta} = Formula. Then the posterior probability for H0 approaches 1 when n -> {infty}, and the method behaves as desired.

The Star-tree Problem
Defining the Problem
The three binary rooted trees for three species are shown in figure 2. The data are three sequences of binary characters, which are assumed to be evolving at a constant rate (that is, under the molecular clock) (Yang 2000Go). The data can be summarized as counts n0, n1, n2, n3 of site patterns xxx, xxy, yxx, and xyx, where x and y are any two distinct characters, while the total number of sites is Formula. Each binary tree has two branch length parameters t0 and t1, measured by the expected number of changes per site. Intuitively, we can see the three variable patterns xxy, yxx, and xyx "support" the three binary trees {tau}1, {tau}2, and {tau}3, respectively. Indeed a likelihood analysis will choose tree {tau}1 as the maximum-likelihood tree if n1 is greater than both n2 and n3. Let p0, p1, p2, p3 be the expected site pattern probabilities, with Formula = 1. Then tree {tau}1 can be represented by p0 > p1 > p2 = p3, with two free parameters, whereas the star tree is p0 > p1 = p2 = p3 (Yang 2000Go). In a Bayesian analysis, we assign equal probabilities Formula to the three binary trees, and exponential priors with means µ0 and µ1 on the two branch lengths t0 and t1 in each binary tree (fig. 2).


Figure 2
View larger version (5K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— The three rooted trees for three species: {tau}1 = ((12)3), {tau}2 = ((23)1), and {tau}3 = ((31)2). Branch lengths t0 and t1 are measured by the expected number of character changes per site. The star tree {tau}0 = (123) is also shown with its branch length t.

 
Star-tree Paradox
Posterior probabilities for the three binary trees (P1, P2, P3) were calculated from data sets simulated under the star tree, with n = 3 x 103, 3 x 106, or 3 x 109 sites in the sequence. It is found that (P1, P2, P3) does not converge to Formula with the increase of n, confirming the star-tree paradox. Instead (P1, P2, P3) vary among data sets, according to a distribution f(P1, P2, P3), which is independent of the branch length t in the star tree and of the prior means µ0 and µ1 (see fig. 7 below). There are four modes in the distribution, such that in most data sets, either the three probabilities are all close to Formula, or one of them is close to 1 and the other two are close to 0. Suppose we consider very high and very low posterior probabilities for binary trees as "errors" since the true tree is the star tree. In 4.2% (or 0.8%) of data sets, at least one of the three posterior probabilities is > 0.95 (or > 0.99%), and in 17.3% (or 2.6%) of data sets, at least one of the three posterior probabilities is < 0.05 (or < 0.01). Those "error" rates appear too high, given that the data sets are arbitrarily large and are supposed to represent infinite data sets.


Figure 7
View larger version (51K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 7.— Estimated joint density, f(P1, P2, P3), of posterior probabilities for the three trees over replicate data sets. The star tree with branch length t = 0.2 is used to generate 105 data sets. Each is analyzed to calculate the posterior probabilities P1, P2, and P3 (equation 15), which are then collected to construct a 2-D histogram and to estimate the 2-D density using an adaptive kernel smoothing algorithm (Silverman 1986Go). The sequence length (and method used to calculate the integrals) is (a) n = 3 x103 sites (exact), (b) n = 3 x103 (approximate), and (c) n = 3 x109 (approximate), where exact calculation is achieved using Mathematica while approximate calculation is based on Laplacian expansion. The density f is shown using the color contours, with green, yellow, to red representing low to high values. The total density mass on the triangle is 1. Note that in the ternary plot, the coordinates (P1, P2, P3) are represented by lines parallel to the sides of the triangle. The two points shown in the key have the coordinates A(0.1, 0.2, 0.7) and B(0.5, 0.3, 0.2), while the center point is Formula.

 
Two Strategies to Resolve the Star-tree Paradox
Further analysis of the tree problem is through an analogy with the fair-coin problem. Note that the fair-coin and fair-balance problems are analytically tractable, but the tree problem is not. My analysis of the tree problem is thus numerical verification by computer simulation, in which only a finite number of replicate data sets can be generated and each data set can only be of finite size. To see the analogy, it is more convenient to consider the site pattern probabilities as parameters in each binary tree instead of branch lengths t0 and t1. In the fair-coin problem, the data have a binomial distribution or multinomial distribution with two cells (corresponding to heads and tails). The two models of negative and positive bias assume that one cell probability is greater than the other, yet the truth (the fair-coin model) is that they are equal. In the star-tree problem, the data have a multinomial distribution with four cells (corresponding to the four site patterns). We compare three binary-tree models, which assume that one of three cell probabilities (for the three variable site patterns) is greater than the other two and that these other two are equal. The truth (the star tree) is that all three cell probabilities are equal. In other words, the three binary trees are represented by {tau}1: p1 > p2 = p3, {tau}2: p2 > p3 = p1 and {tau}3: p3 > p1 = p2, while the true star tree is {tau}0: p1 = p2 = p3. (The probability p0 for the constant pattern may be considered an unimportant nuisance parameter, shared by all four trees.) Both the proportions of heads and tails in the fair-coin problem and the proportions of the site patterns in the tree problem converge to their expected probabilities, with variances proportional to 1/n.

We apply the same two strategies as discussed above for the fair-coin problem to resolve the star-tree paradox. The first uses a prior on parameters in the model to force the binary tree to converge to the star tree, or to force the three cell probabilities p1, p2, p3 to approach equality (p1 = p2 = p3), when n -> {infty}. From the analysis of the fair-coin problem, the prior should force E(p1p2)2 to approach 0 faster than 1/n but more slowly than 1/n2. This means, as seen by translating the prior on cell probabilities into a prior on branch lengths t0 and t1, that the mean µ0 in the exponential prior for the internal branch length t0 should approach 0 faster than Formula but more slowly than 1/n. This prediction is only partially confirmed. Simulations confirm that to resolve the star-tree paradox—that if, for (P1, P2, P3) to converge to Formula if the star tree is the true tree — µ0 should approach 0 faster than Formula. Numerical problems (see later) have prevented confirmation that µ0 should approach 0 more slowly than 1/n for P1 to converge to 1 if tree {tau}1 is the true tree.

The second strategy assigns a nonzero prior probability {pi}0 for the degenerate star tree (p1 = p2 = p3). Simulations confirm that when n -> {infty}, the posterior probability for the star tree approaches 1, and this prior indeed resolves the star-tree paradox. This result is expected from previous theoretical work. Indeed Dawid (1999)Go has studied the asymptotics of Bayesian model selection when the data size n -> {infty}. If all models considered in the Bayesian analysis are wrong, the probability for the model closest to the truth, as measured by the Kullback-Leibler divergence, approaches 1. If one model is correct and all others are wrong, the probability for the true model approaches 1. If several models are true, the probability for the true model with the fewest parameters approaches 1. The case where several models of the same dimension are true is not well specified. Dawid's proof assumes that the parameters are unbounded while here the star tree is at the boundary of the parameter space of the binary trees. However, the qualitative conclusions appear applicable to the tree problem. Here the data are generated under the star tree, so that all four trees are correct, but the star tree has one fewer parameter, and its posterior probability approaches 1.


    Discussion
 TOP
 Abstract
 Introduction
 Biological Synopsis
 Discussion
 Mathematical Analysis
 Addendum
 Appendix. Derivatives for...
 Acknowledgements
 References
 
Does the Star-tree Paradox Exist?
Kolaczkowski and Thornton (2006)Go, referred to hereinafter as KT06, recently argued that the star-tree paradox does not exist. The authors performed three analyses, each of which appears to be invalid or misinterpreted.

First, KT06 simulated data sets with up to n = 107 sites using a star tree of four species, with all four branch lengths equal. The data were analyzed using MrBayes to calculate posterior probabilities (P1, P2, P3) for the three binary unrooted trees without assuming the molecular clock. All five branch lengths in each binary tree are assigned the uniform prior U(0, 10). The variance in the posterior probability for a binary tree, say P1, was initially small, but increased with the increase of n to a stable value of about 0.06 when n ≥ 103 (KT06, fig. 1b). The standard deviation (SD) of ~0.24 Formula is about the same as that obtained in this article for rooted trees of three species (0.2498; see figure 8a below). It is likely that these two values are indeed identical and that the three-species problem of figure 2, studied here, and the four-species problem with equal branch lengths in the star tree, studied by KT06, produce the same limiting distribution f(P1, P2, P3). It is also likely that the distribution in the four-species case is similarly independent of the branch length used in the star tree and the upper bound in the uniform prior for branch lengths in the binary trees. It would be interesting to know whether this invariance holds also when the four branches in the star tree have different lengths. At any rate, the failure of P1 to converge to Formula confirms the star-tree paradox. KT06 appeared to have mistaken a stable variance for zero variance when they claimed that their results disproved the star-tree paradox, and they were incorrect to conclude that "With infinite data, posterior probabilities give equal support for all resolved trees, and the rate of false inferences falls to zero." KT06 emphatically criticized the speculation of Lewis, Holder, and Holsinger (2005)Go that "Bayesian analyses become increasingly unpredictable" with the increase of data size when the true tree is the star tree. Technically, this speculation is confirmed rather than refuted by the result of KT06 (and by the results of this study), as the variance of P1 continues to increase with n, even though the amount of increase approaches zero (KT06, fig. 1b). Clearly, the variance cannot increase without limit, the absolute maximum being 2/9 (with the SD to be Formula= 0.4714), achieved if the posterior probabilities (P1, P2, P3) take only three sets of values, each with probability 1/3: (1, 0, 0), (0, 1, 0), and (0, 0, 1).


Figure 8
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 8.— The density functions (a) for the posterior probability P1 for any binary tree such as {tau}1, (b) for the smallest of the three posterior probabilities Pmin, and (c) for the largest of the three probabilities Pmax. The data of figure 7 are used to estimate the density functions.

 
Second, KT06 examined the so-called type-I error rate in finite data sets of 5,000 sites, and find that when the true tree is the star tree, the posterior probability for a binary tree is > 95% (or > 99%) in less than 5% (or 1%) of data sets. The same pattern holds also for rooted trees in this study, although the posterior probability for a binary tree is < 5% (or < 1%) in more than 5% (or 1%) of data sets, as mentioned above. It is debatable whether such "error" rates are acceptable if they persist in arbitrarily large data sets. While it is appropriate to study so-called Frequentist properties of a Bayesian method, KT06 confused Bayesian posterior probabilities with Frequentist P values when they claimed that "posterior probabilities never produce strong support for incorrectly resolved phylogenies more often than they should." Bayesian statistics in general does not provide a guaranty of its performance under Frequentist criteria. KY06 also claimed that the "type-I" error rate decreased when n increased from 103 to 107 (KY06, fig. 2b). This result is inconsistent with the present study and appears to contradict their finding of an increasing and asymptotically stable variance in P1. The result may be due to numerical problems in the MCMC algorithms in the analysis of KT06.

Third, KT06 used MrBayes to analyze a data set consisting of the expected probabilities of the site patterns calculated under the star tree. This "infinite" data set gave Formula as the posterior probability for each binary tree. However, analysis of this average site is not meaningful, as it ignores the variation among data sets and the fact that the number of sites as well as the proportions of site patterns influences Bayesian analysis. In the fair-coin problem, the data set consisting of Formula heads and Formula tails would produce P = P+ = Formula, but this average coin toss tells us nothing about the behavior of the Bayesian method when n -> {infty} (see fig. 1).

The position of KT06 toward the star-tree paradox is marred by errors in the analysis. The paradox concerns the performance of the Bayesian method in large or infinite data sets, so that finite data sets are not the real issue. Nevertheless the "error" rates in finite data sets are higher than KY06 suggested, because the method produced very small posterior probabilities too often (see above). KT06 expected the "error" rate to reduce to zero when the data size n -> {infty}, with the posterior tree probabilities approaching Formula. This is the behavior of a sensible Bayesian analysis assumed by Lewis, Holder, and Holsinger (2005)Go and Yang and Rannala (2005)Go, although KT06 failed to realize that the Bayesian method does not behave in this way.

Priors and Bayesian Phylogenetics
It is a curious fact that to resolve the fair-coin paradox, the prior probability {pi}0 on the degenerate model of fair coin (H0: {theta} = Formula) can be constant and independent of the data size, while the prior on parameter {theta} (the probability of heads) has to be increasingly concentrated around {theta} = Formula, depending on the data size n. The difference appears to be due to the fact that any point mass has probability zero in a non-degenerate continuous distribution. Nevertheless, both may be viewed as priors on parameter {theta} in models of negative and positive biases (H and H+) without considering H0 in the analysis. The degenerate-model prior is equivalent to assigning a mixture distribution on {theta}, with a component at Formula in proportion {pi}0 and another component from a continuous distribution in proportion 1 – {pi}0. Similarly the star-tree prior {pi}0 is equivalent to a mixture-distribution for internal branch lengths in binary trees (with the star tree excluded from the Bayesian analysis), with a component at zero in proportion {pi}0 and a component from the continuous exponential distribution in proportion 1 – {pi}0. Implementation of the data size-dependent prior is simpler as it requires only a change to the prior mean for internal branch lengths (Yang and Rannala 2005Go). The star-tree prior is more complex because bifurcating and multifurcating trees have different numbers of branch length parameters so that algorithms such as reversible jump MCMC (Green 1995Go) are needed to deal with models of different dimensions (Lewis, Holder, and Holsinger 2005Go).

Both the star-tree prior and the data size-dependent prior may be criticized. Whether truly simultaneous speciation events ever occur in nature is debatable, and if they do not, assigning a prior probability to a model known to be false runs into a conceptual difficulty. Similarly, the use of data size (although not the data themselves) for prior specification may appear non-Bayesian. The prior is supposed to reflect information concerning the parameter before the data are analyzed and should ideally be independent of the data. Nevertheless, this ideal is often hard to achieve in "objective" Bayesian statistics when little information is available about the parameter. Both Jeffreys's prior (Jeffreys 1961Go) and the reference prior (Bernardo 1979Go) depend on the likelihood function or the experimental design. One may ask why one's prior ignorance concerning a parameter should depend on how one conducts the experiment to find out about the parameter. An extreme case is Bernardo's (1980)Go use of the data (not just data size) to specify the prior, although the idea did not appear to be warmly received in the ensuing discussions. Data size-dependent priors were discussed by Bartlett (1957)Go, Davison (2003Go, pp. 586–587), and Cox (2006Go, pp. 42–43, 106–107), as a possible way of resolving Lindley's paradox (see below). One may argue that if data sets of such large sizes are needed to resolve the tree, the internal branch must be very short, so that it may be sensible to assume increasingly shorter internal branches in the prior in larger data sets. Yang and Rannala (2005)Go also discussed the use of empirical estimates of internal branch lengths from real data sets to specify the prior, and pointed out that almost all of the possible phylogenetic trees are wrong, and that most internal branch lengths in wrong trees are estimated to be zero.

The biologist reader should be aware that there have been longstanding fundamental disagreements among statisticians concerning principles of statistical inference. In particular, model selection is a difficult area for both Bayesian and Frequentist statistics, and it is also an area where the two approaches can draw very different conclusions from the same data. A brief overview of this controversy is provided in Yang (2006Go, §5.1.3). As phylogeny reconstruction unfortunately falls into this class of difficult statistical inference problems (e.g., Yang et al. 1995Go), biologists may have to think about what constitutes a sensible behavior in a Bayesian phylogenetic analysis. Six decades ago, Egon S. Pearson (1947)Go wrote that "Hitherto the user has been accustomed to accept the function of probability theory laid down by the mathematicians; but it would be good if he could take a larger share in formulating himself what are the practical requirements that the theory should satisfy in application." This advice may be useful even today.

Almost all controversies surrounding Bayesian inference concern the prior, which is also the focus of this study. For the tree problem, one may take the position that the prior implemented in current computer programs is appropriate and then accept whatever properties Bayesian inference under the prior possesses. The expectation for the posterior tree probabilities to approach Formula when n -> {infty} is then seen as false intuition and no paradox remains. This position may be natural to some Bayesian statisticians. Another position is to judge the method by its statistical properties. In the "objective" Bayesian method, it is a common practice to specify the prior such that the resulting Bayesian inference is deemed reasonable (e.g., Jeffreys 1961Go). I have taken that position in this study, motivated by the observation that the posterior tree probabilities are often too extreme. Two a priori criteria are set up: (1) if the true tree is the star tree, the probabilities for the three binary trees should approach Formula when n -> {infty}, and (2) if a binary tree is the true tree, its posterior probability should approach 1. Two strategies of prior specification are then found to meet those criteria.

Based on the perception that the posterior probabilities for trees or clades are often too high, some authors (e.g., Suzuki, Glazko, and Nei 2002Go; Simmons, Pickett, and Miya 2004Go) argued that the Bayesian posterior probabilities for trees or clades are not trustable, and alternative methods such as the bootstrap should be used to assess the reliability of estimated trees. Similarly Douady et al. (2003)Go suggested the bootstrapped Bayesian analysis, in which the Bayesian method is used to analyze bootstrap pseudo-data sets. The method then involves prohibitive computation and is also a strange mix of Bayesian and Frequentist methodologies. Instead, we consider the default priors implemented in current computer programs to be inappropriate and attempt to specify better priors to produce more reasonable posterior tree probabilities. Lewis, Holder, and Holsinger (2005)Go also emphasized the fact that a number of realistic evolutionary models have been implemented in the MrBayes program, making the method an attractive option for analyzing ever-increasing genetic data sets.


    Mathematical Analysis
 TOP
 Abstract
 Introduction
 Biological Synopsis
 Discussion
 Mathematical Analysis
 Addendum
 Appendix. Derivatives for...
 Acknowledgements
 References
 
Bayesian Analysis of the Fair-balance Problem
The Fair-balance Problem
In the fair-coin problem, one may also assign an informative prior on the probability of heads: {theta} ~ beta({alpha}, {alpha}). The beta distribution with {alpha} > 1 has a mode at Formula, so that the coin is more likely to be nearly even than seriously biased. For large {alpha}, beta({alpha}, {alpha}) can be approximated by a normal distribution with mean Formula and variance 1/(8{alpha} + 4). The likelihood, given by the binomial probability of the number of heads y ~ bi(n, {theta}), is approximated by the normal density y/n ~ N(Formula, 1/(4n)). The posterior {theta}|y ~ beta(y + {alpha}, ny + {alpha}) can be approximated by the normal distribution with mean Formula and variance Formula . (Note that the variance is approximately proportional to 1/n if n is large and {alpha} is fixed.) Thus we redefine {theta} Formula as the parameter, use the normal distribution to approximate the prior, the likelihood, and the posterior; and restate the problem as the following fair-balance problem. Suppose the data consist of n independent observations y1, y2, ..., yn, with yi ~ N({theta}, {sigma}2), where {theta} is unknown and {sigma}2 is known. The yis may be measurement errors on a balance. Let Formula be the sample mean. The two models are then H: {theta} < 0 and H+: {theta} > 0. In the prior we assign equal probabilities (Formula) for each model, and {theta} ~ N(0, {xi}{sigma}2), truncated to the appropriate range in each model.

The posterior of {theta} is then given by {theta}|Formula ~ Formula , from which one can get the posterior probability for model H as

Formula (1)
where {Phi}(·) is the cumulative distribution function (c.d.f.) of the standard normal distribution (Yang and Rannala 2005Go, equation 6). Note that Formula is a random variable from the standard normal distribution.

Suppose the true parameter is {theta}0. As Formula varies among data sets as Formula ~ N({theta}0, {sigma}2/n), P has the density

Formula (2)
where {Phi}–1(·) is the inverse c.d.f. of the standard normal distribution (Yang and Rannala 2005Go, equation 12). This is a function of P, n{xi}, and Formula .

If the balance is fair and the true parameter {theta}0 = 0, equation (2) becomes

Formula (3)
(Yang and Rannala 2005Go, equation 13). This is a function of P and n{xi}. If {xi} is a constant, we have n{xi} -> {infty} and f(P) -> 1 when n -> {infty}, so that P converges to the uniform distribution U(0, 1) (see fig. 1). This is called the fair-balance paradox (Yang and Rannala 2005Go). We would like P to approach Formula when n -> {infty}, but it fails to do so.

The Data Size-Dependent Prior
One of the ideas suggested in the discussions of Lindley's paradox (Lindley 1957Go, see below) is to let the prior be increasingly informative with the increase of the data size. Consider {xi} = c/n{gamma} as a prior for {theta}, as a possible way for resolving the fair-balance paradox. From equation (3), it is clear that if 0 < {gamma} < 1, P still converges to U(0, 1), even though this prior forces {theta} to be closer and closer to 0 with the increase of n, converging to a point mass at {theta} = 0 in the limit. If {gamma} = 1 so that {xi} = c/n, f(P) peaks at P = Formula, but the distribution does not degenerate to a point mass at Formula (equation 3). Figure 3 shows a few densities when c = n{xi} = 0.1, 1, and 2. Note that in this case the prior {theta} ~ N(0, c{sigma}2/n) and the likelihood Formula ~ N({theta}, {sigma}2/n) have the same "precision" about {theta}. When {gamma} > 1, f(P) -> 0 for all values of P except P = Formula; that is, P converges to a point mass at Formula. Thus to avoid the fair-balance paradox, we should have {gamma} > 1 in {xi} = c/n{gamma}; the variance in the prior of {theta} should approach 0 faster than 1/n.


Figure 3
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— The density f(P) for the posterior model probability P in the fair-balance problem when the prior is {theta}~ N(0, {xi}{sigma}2) with {xi} = c/n. The true {theta} = 0. The plots correspond to c = n{xi} = 0.1, 0.5, and 2, calculated using equation (3).

 
The case of {theta}0 != 0 (equation 2) is summarized in table 1. The statement of Yang and Rannala (2005Go, pp. 468–469) that P converges to 1 if {theta}0 < 0 (or to 0 if {theta}0 > 0) irrespective of {gamma} in {xi} = c/n{gamma} is inaccurate. Indeed the behavior of f(P) depends on {gamma}. To ensure that P -> 1 if {theta}0 < 0 (and P -> 0 if {theta}0 > 0), we require {gamma} < 2. Any value of {gamma} in the interval (1, 2) will produce sensible Bayesian inference by the criteria used here, and a smaller {gamma} corresponds to a more powerful analysis, as it produces higher posterior probabilities for the true model if the coin is biased. Figure 4a shows that the posterior probability P calculated from a data set may be very sensitive to the prior or the value of {gamma}. Furthermore, while f(P) converges to a point mass at 1, Formula, and 0 if the true {theta}0 < 0, = 0, and > 0, respectively, the rate of convergence depends on {gamma}. Curves a & b in figure 5 show the density when {theta}0 = 0 and 0.1 at n = 1,000 when {gamma} = Formula is used.


View this table:
[in this window]
[in a new window]

 
Table 1 Behavior of Posterior Model Probability P in the Fair-balance Problem When the Data Size n -> {infty} and the Prior is {theta} ~ N(0, {xi}{sigma}2) with {xi} = c/n{gamma}

 

Figure 4
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— The sensitivity of posterior model probability P to the prior in the fair-balance problem. (a) The prior is specified as {theta}~ N(0, {xi}{sigma}2) with {xi} = c/n{gamma}, and the posterior P is calculated using equation (1). (b) A prior probability {pi}0 is assigned to the degenerate model H0: {theta} = 0, while {theta}~ N(0, c{sigma}2) under models H and H+. The posterior model probabilities P0 and P are calculated using equation (5), and then Formula = P0/2 + P is used in the plot. In both (a) and (b), n = 1000 and c = 2 are fixed.

 

Figure 5
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 5.— The effect of prior on f(P), the density of posterior model probability P in the fair-balance problem. The sample size is n = 1000. The true parameter value is {theta}0= 0 in (a) and (c) and 0.1 in (b) and (d). In (a) and (b), the prior is specified as {theta}~ N(0, {xi}{sigma}2) where {xi}= 2/n{gamma} with {gamma} = Formula, and f(P) is calculated using equation (2). In (c) and (d), the prior probability {pi}0 = Formula is assigned to the degenerate model {theta} = 0, while {theta}~ N(0, 2{sigma}2) in used under models H and H+. One million data sets are simulated by generating Formula~ N({theta}0, {sigma}2/n), and P0 and P are calculated by equation 5. Then Formula = P0/2 + P is used to construct the histogram and to estimate the density f(P).

 
The Degenerate-Model Prior
Another strategy is to assign a nonzero probability to the degenerate model H0: {theta} = 0 (Lewis, Holder, and Holsinger 2005Go). The three models H0, H, and H+ then have the prior probabilities {pi}0, (1 – {pi}0)/2, and (1 – {pi}0)/2. The unknown parameter in H and H+ is assigned the prior {theta} ~ N(0, {xi}{sigma}2), truncated to the appropriate range, where {xi} is a constant.

The likelihood is given by Formula|{theta} ~ N({theta}, {sigma}2/n). The marginal likelihoods are

Formula (4)
The posterior probabilities for the three models are

Formula (5)
where D = Formula . Note that P0, P, and P+ are functions of Formula , {pi}0, and n{xi}.

When the sample mean Formula varies among data sets according to N({theta}0, {sigma}2/n), with density

Formula (6)
the posterior model probabilities will have a joint density f(P0, P, P+). This appears intractable. However, the marginal density of P0 can be derived as follows. Rewrite P0 in equation (5) as

Formula (7)
with a = Formula and b = Formula. Any P0 in the interval (0, 1/(1 + a)) corresponds to two Formula:

Formula (8)
For each of them, the Jacobi determinant is

Formula (9)

Thus

Formula (10)
where B = Formula , and 0 < P0 < Formula . Note that f(P0|{theta}0) depends on P0, {pi}0, n{xi}, and Formula .

If {theta}0 = 0, equation (10) reduces to

Formula (11)
This density is specified by {pi}0 and n{xi}.

Equations (10) and (11) can be used to confirm that as long as {pi}0 > 0, f(P0) converges to a point mass at 1 when n -> {infty} if {theta}0 = 0 (so that H0 is true), and that if {theta}0 != 0 (so that H0 is false), f(P0) will converge to 0, in which case one of P and P+ (the one corresponding to the true model) will converge to 1. In other words, the probability for the correct model always converges to 1 when n -> {infty}. This is a special case of Dawid's (1999)Go general proof of the consistency of Bayesian model selection.

Here I consider the prior probability {pi}0 as a way of resolving the fair-balance paradox and treat P0 as equal support for H and H+. Thus (P0, P, P+) calculated from any data set are converted to Formula= (P0/2 + P, P0/2 + P+). Then if {theta}0 = 0, we have P0 -> 1, so that Formula -> Formula and Formula -> Formula. Similarly, if {theta}0 != 0, we have P0 -> 0, so that one of Formula and Formula will approach 1. It is clear that use of the prior probability {pi}0 resolves the fair-balance paradox.

Nevertheless, the Bayesian analysis may be very sensitive to the value of {pi}0, and this sensitivity appears to be the nature of the problem. For example, for a data set of size n = 1,000 with Formula = –0.05, we have Formula to be 0.943, 0.683, 0.560 and 0.532, if {pi}0 = 0, 1/10, Formula, and Formula, respectively (fig. 4b). Furthermore, while Formula converges to a point mass at 1, Formula, and 0 if the true {theta}0 < 0, = 0, and > 0, respectively, the convergence may be at very different rates depending on {pi}0. Curves c & d in figure 5 show the density for {theta}0 = 0 and 0.1, with n = 1,000 when the prior {pi}0 = Formula is used. This prior produces high posterior probabilities for the true model much more often and may be considered more powerful than the data size-dependent prior with {gamma} = Formula, that is, {theta} ~ Formula (curves a & b in fig. 5).

Lindley's Paradox
If we do not distinguish between models H and H+ and define P1 = 1 – P0 = P + P+, the problem becomes one of comparing a sharp null hypothesis H0: {theta} = 0 with a composite alternative hypothesis H1: {theta} != 0. This is the case for Lindley's (1957Go; see also Jeffreys 1939Go) paradox. If Formula is fixed but n -> {infty}, then P0 -> 1 (eq. 7). Lindley's paradox refers to the observation that in a data set, Formula may differ sufficiently from 0 for H0 to be rejected by a significance test, while Bayesian analysis of the same data strongly supports H0 with posterior probability P0 {approx} 1. Thus significance test and Bayesian analysis draw opposite conclusions from the same data. Indeed, if large data sets are generated under the null model, such contradictions will occur in ~5% of data sets if the significance test is conducted at the 5% level. As discussed above, if H0 is true and n is large, P0 {approx} 1 in nearly every data set, but the significance test will still reject the true null hypothesis 5% of the time. This result appears to suggest flaws in the methodology of significance test, as claimed by some Bayesian statisticians (e.g., Good 1982Go, p. 342; Press 2003Go, pp. 220–225; Berger 1985Go, pp. 144–157), rather than in Bayesian analysis, as suggested by, e.g., Bernardo (1980)Go and Shafer (1982)Go. Furthermore, Davison (2003Go, pp. 586–587) and Cox (2006Go, pp. 42–43, 106–107) (see also, Bartlett 1957Go) suggested the use of {xi} = c/n, so that {theta} ~ N(0, c{sigma}2/n), to resolve Lindley's paradox. By the criteria used here, this prior is not acceptable as it causes f(P0) to fail to converge to the point mass at 1 when {theta}0 = 0 (see equation 11)!

Nevertheless, whatever the true model or the observed data Formula, P0 can be made arbitrarily close to 1 by the use of a diffuse prior or a large {xi}, as P0 -> 1 when {xi} -> {infty} in equation (7). Bayesian analysis in this case is extremely sensitive to the prior.

Bayesian Tree Estimation in the Three-Species Case
The Tree Problem
There are three (rooted) binary trees for three species (fig. 2): {tau}1 = ((12)3), {tau}2 = ((23)1), and {tau}3 = ((31)2). We consider binary characters, which evolve at a constant rate according to a stationary Markov process. The data are counts n0, n1, n2, n3 of site patterns xxx, xxy, yxx, and xyx. Let xi = ni/n, i = 0, 1, 2, 3, be the proportions of the site patterns. The data may be represented as n = {n1, n2, n3} or x = {x1, x2, x3}, with n to be the total number of sites.

Under tree {tau}1, with branch lengths t0 and t1 (fig. 2), the probabilities of observing the four site patterns are

Formula (12)
(Yang 2000Go). As 0 ≤ t0, t1 ≤ {infty}, we have p0 ≥ p1 ≥ p2 = p3 ≥ 0 and p0 + p1 + p2 + p3 = 1. The likelihoods under the three trees are

Formula (13)
We assign prior probability Formula for each binary tree and exponential priors with means µ0 and µ1 for t0 and t1: f(t0) = exp{–t00}/µ0 and f(t1) = exp{–t11}/µ1. The exponential priors appear more sensible than uniform priors since most branch lengths in real trees are small while very large branch lengths are rare. The marginal likelihood under tree {tau}i is

Formula (14)
The posterior tree probability is

Formula (15)

Thus analysis of each data set requires evaluation of three two-dimensional integrals. (In contrast, the case of four species and no molecular clock requires evaluation of three five-dimensional integrals.) Yang and Rannala (2005)Go used Mathematica (Wolfram 2003Go) to calculate the integrals of equation (14) numerically. This is found to be unreliable in large data sets, with n ≥ 5,000, say. A difficulty is that the integrand is nearly a spike at its mode.

Two ideas appear promising. The first is to use the site pattern probabilities as parameters in the binary tree instead of t0 and t1 and construct conjugate priors on them. The second is to use large-sample approximations. The latter is explored in this study.

Approximate Calculation of Posterior Probabilities for Trees
We use Laplacian expansion (Copson 1965Go, pp. 36–47; Bender and Orszag 1999Go, pp. 261–276) to approximate the integral M1 for tree {tau}1 (eq. 14). The integrals M2 and M3 for trees {tau}2 and {tau}3 are calculated by a permutation of the counts n1, n2, n3. In a typical Bayesian estimation problem under a well-specified model, the likelihood function and the posterior density can be quite accurately approximated using a normal density in large data sets (Lindley 1980Go; Tierney and Kadane 1986Go). However, phylogenetic trees are different models (e.g., Yang et al. 1995Go). For any given data set, the maximum likelihood estimate (MLE) of t0 is zero in at least one tree, in which case the normal approximation breaks down. Instead the tedious algorithms presented below were derived by trial and error, with intensive testing in comparison with Mathematica.

Rewrite equation (14) as

Formula (16)
where f(t0, t1) = f(t0)f(t1) is the prior for t0 and t1, and

Formula (17)
Here xi = ni/n is the observed site pattern frequencies. Define hi = {partial}h/{partial}ti, hij = {partial}2h/{partial}ti{partial}tj, etc., to be the derivatives evaluated at the MLEs Formula and Formula (see Appendix). Let H = {hij} be the Hessian matrix. If H is positive-definite, we let

Formula
with the determinant

Formula
where {rho} = {sigma}01/({sigma}0{sigma}1). We use the first few terms in the Taylor expansions of f and h as approximations

Formula (18)

The integral of equation (16) is the volume of the solid between the f·enh surface above the t0t1 plane in the quarterplane t0 > 0, t1 > 0. We consider three cases, depending on whether Formula > 0 and whether Formula= 0 (Yang 2000Go, Tables 2 and 3). We assume that x0 > 1/4.


View this table:
[in this window]
[in a new window]

 
Table 2 The Log Marginal Likelihood log(Mi) and the Posterior Probabilities for the Three Trees Calculated Using Exact (above) and Approximate (below) Methods

 
Case I: x1 > (1 – x0)/3. We have Formula > 0, and Formula > 0, with Formula = Formula = 0 (Yang 2000Go, Table 3). The integral can then be approximated by the volume at the neighborhood of the MLE Formula, where the likelihood surface is nearly that of a bivariate normal density function.

Formula (19)
As discussed by Lindley (1980Go, equation 2), a few more terms may be used in the Taylor expansion of f and h, but this was found to lead to minimal improvement to the approximation. More importantly, the MLE Formula is often close to 0, or Formula is small (say, <3), in which case equation (19) is not very reliable. The bivariate normal integral can then be calculated using the algorithm of Drezner and Wesolowsky (1990)Go, which was found to produce good results.

Case II: x1 = (1 – x0)/3. We have Formula = 0, Formula > 0, with Formula = Formula = 0. The integral is then half that in case I as the volume above the half plane t0 < 0 is missing.

Formula (20)

Case III: x1 < (1 – x0)/3. We have Formula = 0 and Formula > 0, with Formula < 0 and Formula = 0. This situation is complex, and is broken into two cases, depending on whether the Hessian matrix H is positive definite.

In case IIIa, H is positive definite. We then use all second-order terms in the Taylor expansion of h.

Formula (21)
Apply the variable transform y0 = t0/{sigma}0, Formula, and we have

Formula (22)
If –nh0{sigma}0 >> 1, we may apply Watson's Lemma to approximate the integral in equation (22). Write this as Formula, where q(y) = Formula, with a = Formula , b = Formula , and c = –nh0{sigma}0. From the MacLaurin expansion of q(y), we have

Formula (23)
where qk(0) is the kth derivative of q, evaluated at y = 0. The first few derivatives are as follows:

Formula (24)
where {phi}(·) is the probability density function (p.d.f.) of a standard normal variate. Thus

Formula (25)
The algorithm of Hill (1973)Go is used to calculate {Phi}(·).

However, if c = –nh0{sigma}0 is small (< 1), as may be the case if h0 is nearly zero, equation (25) is unreliable. Then I use the Gauss-Legendre quadrature to calculate the one-dimensional integral of equation (22) numerically, which was found to produce reliable results.

In case IIIb, x1 < (1 – x0)/3, so that Formula = 0 and Formula > 0, with h0 = Formula< 0 and h1 = Formula= 0, but H is not positive-definite. This case occurs mainly when the data are very unlikely on the tree and h0 is very negative. We then use the linear term for t0 and quadratic term for t1 in the Taylor expansion of h, as follows.


Formula (26)

Change variables from t1 to z = Formula, where {sigma}1 = Formula.

Formula (27)
Here Formula, and thus the integral from Formula to {infty} is nearly the same as from –{infty} to {infty}, while 1/(1 + a) = 1 – a + a2a3 + ... when |a| = Formula< 1. The last equality uses the result that if z is a random variable from the standard normal distribution, E(zk) = 0 for odd k or (k – 1) (k – 3) · 3 · 1 for even k (e.g., Johnson et al. 1994Go, p. 89).

Suppose in the data set, n1 > n2 > n3. Then M1 > M2 > M3. Calculation of M1 makes use of equation (19) for case I, and calculation of M3 makes use of equations (22) or (27) for case IIIa. Calculation of M2 uses each of these two cases about half of the time. Cases II (equation 20) and IIIb (equation 27) are rarely encountered.

The above discussion assumes that the prior on branch lengths are fixed, with µ0 and µ1 to be fixed constants. When µ0 depends on the data size n, some modifications to the above algorithm are necessary.

The exact calculation using Mathematica is reliable for small data sets, and unstable for large ones (say, with n > 5,000). The approximate calculation is the opposite. It is reliable for large data sets only, say with n ≥ 1,000. Figure 6 shows posterior tree probabilities calculated using the two methods, while table 2 shows the effect of sample size n on the approximation. On a 3.2GHz Pentium IV, analyzing 105 data sets took a few seconds using the approximate method and ~15 days using Mathematica. Both methods are much faster than MCMC for this small problem. The approximation allows us to calculate posterior tree probabilities for arbitrarily large data sets.


Figure 6
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 6.— The posterior probability P1for tree {tau}1calculated using exact (Mathematica) and approximate methods in 100 data sets, simulated using the star tree with t = 0.2. The sequence length is n = 3 x103. The prior means are µ0 = 0.1 and µ0 =0.2.

 
Simulation of Data
Consider simulation of data sets under tree {tau}1 with given branch lengths t0 and t1. Simulation under the star tree {tau}0 can be done using the same algorithm by fixing t0 = 0. The counts of sites follow a multinomial distribution with four cells: MN4(n; p0, p1, p2, p2), with cell probabilities given in equation (12). For large n, the data have approximately a trivariate normal distribution: n = (n1, n2, n3) ~ N3(n{theta}0, nS0), where

Formula (28)

We have |nS0| = n3p0p1p2p2, and

Formula (29)
The normal density is

Formula (30)
where (x{theta}0)T = (x1p1, x2 p2, x3p2)T, and T is the transpose.

The Cholesky decomposition of the variance matrix is given as nS0 = LLT, with

Formula (31)
where

Formula (32)
Thus to generate a data set, we generate three independent N(0, 1) random variables z1, z2, and z3 to form z = (z1, z2, z3)T. Then n = n{theta}0 + Lz will be the desired counts of site patterns.

Two Strategies to Resolve the Star-tree Paradox
We now consider the two priors for resolving the star-tree paradox, following our discussions of the fair-coin and fair-balance paradoxes above. The first is to let the prior mean for the internal branch length approach zero when the data size increases, and the second is to assign a nonzero probability {pi}0 for the degenerate star tree.

Data Size-Dependent Prior
This forces the mean µ0 in the prior for internal branch length t0 to approach 0, or, equivalently, to force the probabilities of the three variable site patterns p1, p2, and p3 to approach equality (p1 = p2 = p3), when n -> {infty} (Yang and Rannala 2005Go). In the fair-coin problem, 1 – {theta} and {theta} are the two cell probabilities in the multinomial (binomial) distribution, the models of negative and positive bias are specified as H: 1 – {theta} > {theta} and H: 1 – {theta} < {theta} while the fair-coin model is H0: 1 – {theta} = {theta}. The distance between H (say) and H0 may be measured by |1 – {theta}{theta}| = |1 – 2{theta}|. It was determined that the prior should force E(1 – 2{theta})2 or the variance of {theta} to approach 0 faster than 1/n but more slowly than 1/n2. In the tree problem, the binary tree, say {tau}1, is represented by p1 > p2 = p3 while the star tree {tau}0 is p1 = p2 = p3, where p1, p2, p3 are three cell probabilities in a multinomial distribution. The distance between {tau}1 and {tau}0 can be measured by |p1 p2|, and by analogy with the fair-coin problem, we require the prior on branch lengths t0 and t1 should force E(p1 p2)2 to approach 0 faster than 1/n but more slowly than 1/n2.

Let µ0 = c/n{gamma} with {gamma} > 0. The prior for branch lengths t0 and t1 is given by the independent exponential distributions

Formula (33)
In place of t0 and t1, we use p0 and {delta} = p1 p2 as the new parameters in the binary tree; the two sets of parameters are related by equation (12). The prior distribution of p0 and {delta} is obtained from equation (33) through a variable transform as

Formula (34)
We have E({delta}2) = Formula. Thus µ0 should approach 0 faster than Formula but more slowly than 1/n; in other words we require Formula < {gamma} < 1 in µ0 = c/n{gamma}.

Degenerate-Model Prior {pi}0
We assign a prior probability {pi}0 > 0 for the star tree {tau}0, while the three binary trees are assigned prior probabilities {pi}1 = {pi}2 = {pi}3 = (1 – {pi}0)/3 (Lewis, Holder, and Holsinger 2005Go). The branch length t in the star tree is assigned the prior f(t) = exp{–t1}/µ1. The marginal likelihood M0 under {tau}0 is a one-dimensional integral over t, similar to equation (14). This is reliably calculated by approximating the likelihood with a normal density, similarly to the calculation with equation (19). The marginal likelihoods for the three binary trees M1, M2, and M3 are calculated as before. Then {pi}iMi, i = 0, 1, 2, 3, are rescaled to sum to one to give the posterior probabilities for all four trees. As the star tree is a special case of the three binary trees with one fewer parameter, all four trees are correct when the data are generated from the star tree. Thus we expect the posterior probability for the star tree {tau}0 to converge to 1 as the star-tree model has a lower dimension (Dawid 1999Go). Here we consider {pi}0 as a way of resolving the star-tree paradox and divide P0 among the three binary trees to calculate their posterior probabilities

Formula (35)

Thus P1, P2, P3 will converge to the point mass at Formula when n -> {infty} if the data are generated under the star tree, and to (1, 0, 0) if the data are generated under the binary tree {tau}1.

Simulation Results
The Star-tree Paradox
We use computer simulation to study the variation in posterior tree probabilities (P1, P2, P3) when data sets are generated under the star tree. The branch length is fixed at t = 0.2. Each of the 105 replicate data sets is analyzed using the Bayesian method to calculate P1, P2, P3, using equal prior probabilities (Formula) for the three binary trees and exponential priors for branch lengths with means µ0 = 0.1 and µ1 = 0.2 (equation 15). The distribution f(P1, P2, P3) across data sets is estimated by a kernel-density smoothing algorithm (Silverman 1986Go). Three sequence lengths are used: 3 x 103, 3 x 106, and 3 x 109. For n = 3 x 103, both exact calculation using Mathematica and the approximate method by Laplacian expansion are used, while for the two large data sizes, only the approximate method is used.

Figure 7 shows the joint density f(P1, P2, P3) for n = 3 x 103 and 3 x 109. Figure 8 shows three univariate densities derived from the same data, for P1, for Pmin = min(P1, P2, P3) and for Pmax = max(P1, P2, P3). For n = 3 x 103, the exact and approximate methods produced results that are indistinguishable, suggesting that the approximation is reliable. The results for n = 3 x 103, 3 x 106 (not shown), and 3 x 109 are very similar, indicating that for the parameter values used, n = 3 x 103 is close to infinity, although it is noticeable that the posterior probabilities tend to become more extreme (near 0 or 1) in larger data sets (fig. 8a). The SD for P1 is 0.2440 for n = 3 x 103 and 0.2498 for n = 3 x 106 and 3 x 109. In general, the means and SDs for P1, Pmin, and Pmax are identical to the fourth decimal place between n = 3 x 106 and 3 x 109.

For n = 3 x 109, data sets are also simulated using different values of the branch length t in the star tree (such as 0.1, 0.3, 0.4, 0.5, and 1.0), and they are analyzed using different prior means µ0 and µ1 (such as µ0 = 0.2, 0.5, 10 and µ1 = 0.1, 0.3, 0.7). The number of replicates is also raised to 107. As far as can be judged, the distribution f(P1, P2, P3) is independent of t, µ0 and µ1. The invariance of f(P1, P2, P3) to parameters t, µ0 and µ1 may be generally true as it parallels the fair-balance analysis in which the limiting distribution f(P) is uniform, independent of parameter {xi} in the prior {theta} ~ N(0, {xi}{sigma}2). It also indicates that the distribution is unlikely to change when n increases beyond 3 x 109. In all cases examined, every Pi has mean 1/3 and SD 0.2498, and pairwise correlation coefficient –0.5000. The correlation should be exactly Formula, according to the following symmetry argument (Peter Green, pers. comm.). From 1 = P1 + P2 + P3, we have

Formula (36)
so that corr(P1, P2) = cov(P1, P2)/var(P1) = – Formula. There are four modes in the distribution, at the center and the three corners of the ternary graph (fig. 7).

We now use the distributions of P1, Pmin and Pmax for n = 3 x 109 to examine how often the Bayesian method produces extreme posterior probabilities, assuming that this sequence length represents the limiting case of infinite data (fig. 8). Pmin has mean 0.1298 and SD 0.0769 while Pmax has mean 0.6319 and SD 0.1698. In 4.23% of data sets, Pmax > 0.95 (that is, at least one of the three posterior probabilities is > 0.95), and in 0.79% of data sets, Pmax > 0.99. In 17.3% of data sets, Pmin < 0.05 (that is, at least one of the three posterior probabilities is < 0.05), and in 2.6% of data sets, Pmin < 0.01. If we consider any particular binary tree, such as {tau}1, we find that the proportion of data sets in which P1 < 0.05 (or 0.01) is 8.1% (or 1.31%), and the proportion of data sets in which P1 > 0.95 (or 0.99) is 1.41% (or 0.26%). Because the true tree is the star tree, we would not want any binary tree to have either a very high or a very low posterior probability. The method appears to produce extreme posterior probabilities, especially very small ones, quite often.

Data Size-dependent Prior
This prior forces the mean µ0 of internal branch lengths to approach 0 when n -> {infty}. We let µ0 = 0.1/n{gamma} and use different values for {gamma}. When the data are simulated under the star tree, the means of the posterior probabilities for the three binary trees are always Formula. Figure 9a shows the SD of P1 for tree {tau}1 when {gamma} = 0, 0.5, 0.51, 0.707, and 0.8. Our theoretical analysis suggests that {gamma} has to be greater than Formula for P1 to converge to the point mass Formula. If {gamma} = 0, the SD of P1 converges to 0.2498 when n -> {infty}; this is the case of the star-tree paradox discussed above. If {gamma} = 0.5, the SD stabilizes to 0.064 instead of 0. Thus (P1, P2, P3) have a distribution, which depends on parameters such as branch length t in the star tree, and µ1 and c in the prior (in µ0 = c/n{gamma}). This is analogous to the case of {theta}0 = 0 and {gamma} = 1 in table 1 for the fair-balance problem (fig. 3). When {gamma} = 0.51, slightly larger than Formula, the SD decreases monotonically from 0.0608 at n = 103 to 0.0522 at n = 3 x 109. The limit when n -> {infty} should be 0, according to the theoretical analysis. If {gamma} = 0.707 or 0.8, the SD clearly converges to 0 when n -> {infty}.


Figure 9
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 9.— (a) The SD in the posterior probability P1 for tree {tau}1 is plotted against the data size n when the data are simulated under the star tree with branch lengths t0 = 0 and t1 = 0.2 and analyzed using the prior means µ0 = 0.1/n{gamma} and µ1 = 0.2. The values of {gamma} are 0, 0.5, 0.51, 0.707, and 0.8 from top to bottom. The theoretical expectation is that the SD -> 0 (so that P1 -> Formula) when n -> {infty}if and only if {gamma} > 0.5. (b) The SD in P1 is plotted against the sample size n when the data are simulated under a binary tree with branch lengths t0 = 0.01 and t1 = 0.2. The same priors are used to analyze the data as in (a), with {gamma} to be 0, 0.5, and 0.707. The theoretical expectation is that the SD -> 0 (so that P1 -> 1) if {gamma}<1 but P1 -> Formula if {gamma} > 1; this expectation is not confirmed here as values of {gamma} around 1 caused computational problems.

 
Results obtained when the data are simulated under the binary tree {tau}1 with t0 = 0.01 and t1 = 0.2 are shown in figure 9b. The theoretical analysis predicts that one has to have {gamma} < 1 for P1 to converge to the point mass at 1 when n -> {infty}. If {gamma} = 0, 0.5, or 0.707 (all less than 1), the mean of P1 indeed converges to 1 while the SD converges to 0, so that the probability for the true model converges to 1 (fig. 9b). Numerical problems are encountered with larger values of {gamma}, so that the cases in which {gamma} is close to or larger than 1 are not examined. Nevertheless, as long as the star-tree paradox is resolved (with {gamma} > Formula), small values for {gamma} are preferred to larger ones, as small values lead to higher posterior probabilities for the true tree when the true tree is binary. Three convenient values for {gamma} are 0.667, 0.707, and 0.75. These are the harmonic, geometric, and arithmetic means of Formula and 1, and may represent conservative, moderate, and liberal priors, respectively.

Degenerate-Model Star-tree Prior {pi}0
Here a nonzero probability {pi}0 is assigned for the degenerate star tree {tau}0, while the three binary trees have prior probabilities {pi}1 = {pi}2 = {pi}3 = (1 – {pi}0)/3. The posterior probabilities for the three binary trees are calculated using equation (35). We are interested in the behavior of the joint density f(P1, P2, P3) when the data size n -> {infty} and when the data are generated under either the star tree or a binary tree.

A few different values are used for {pi}0: 1/10, 1/4, and Formula. In every case, the joint density f(P1, P2, P3) converges to Formula when n -> {infty}. For example, with t = 0.2 in the star tree and {pi}0 = 0.25, µ0 = 0.1, and µ1 = 0.2 in the prior, the SD of P1 is calculated to be 0.125, 0.025, and 0.004 for n = 3 x 103, 3 x 106, and 3 x 109, respectively. The mean of the distribution is clearly Formula, and the convergence of the SD to 0 means that the distribution is becoming degenerate to the point mass at the mean. When {pi}0 = 0.1, the SD of P1 is 0.177, 0.044, and 0.007 for the three values of n, and the rate of convergence is slower than when {pi}0 = 0.25.

Furthermore, analysis of data sets simulated under a binary tree with t0 > 0 confirms that when n increases, the posterior probability for the true binary tree approaches 1. In sum, use of the prior {pi}0 resolves the star-tree paradox, as long as 0 < {pi}0 < 1. This result is expected from Dawid's (1999)Go general proof of consistency of Bayesian model selection.


    Addendum
 TOP
 Abstract
 Introduction
 Biological Synopsis
 Discussion
 Mathematical Analysis
 Addendum
 Appendix. Derivatives for...
 Acknowledgements
 References
 
Steel and Matsen (2007)Go recently published a mathematical analysis of the star-tree problem (fig. 2), proving that when the number of sites n -> {infty}, the posterior probability for any binary tree, say, P1, does not converge to Formula and will maintain a strictly positive probability of being large (say, > 0.99). The result is consistent with this study, contra Kolaczkowski and Thornton (2006)Go. Note that the limiting distribution f(P1, P2, P3) when n -> {infty} remains unknown.


    Appendix. Derivatives for Laplacian Expansion
 TOP
 Abstract
 Introduction
 Biological Synopsis
 Discussion
 Mathematical Analysis
 Addendum
 Appendix. Derivatives for...
 Acknowledgements
 References
 
Consider tree {tau}1. The data can be represented as x0 = n0/n, x1 = n1/n, and the likelihood L = nh, where

Formula (37)
where p0, p1, p2 are given in equation (12). Let Formula and Formula and note that

Formula (38)
Then the gradient g = Formula and Hessian matrix H = Formula are

Formula (39)
and

Formula (40)
The third derivatives are

Formula (41)
The above formulae are confirmed by using the difference method to approximate the derivatives.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Biological Synopsis
 Discussion
 Mathematical Analysis
 Addendum
 Appendix. Derivatives for...
 Acknowledgements
 References
 
I am grateful to Professor Peter Green of University of Bristol for pointing out that the correlation between any two posterior probabilities in the star-tree distribution is exactly Formula. I thank Professor Philip Dawid (UCL) for very useful discussions, and Jim Mallet and Max Telford for comments on the first part of the manuscript. This study is supported by a grant from the Natural Environment Research Council (UK).


    Footnotes
 
Arndt von Haeseler, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Biological Synopsis
 Discussion
 Mathematical Analysis
 Addendum
 Appendix. Derivatives for...
 Acknowledgements
 References
 

    Bartlett MS. A comment on D.V. Lindley's paradox. Biometrika (1957) 44:533–534.[Free Full Text]

    Bender CM, Orszag SA. Advanced Mathematical Methods for Scientists and Engineers: Asymptotic Methods and Perturbation Theory (1999) New York: Springer-Verlag.

    Berger JO. Statistical Decision Theory and Bayesian Analysis (1985) New York: Springer-Verlag.

    Bernardo JM. Reference posterior distributions for Bayesian inference. J R Stat Soc B (1979) 41:113–147.

    Bernardo JM. A Bayesian analysis of classical hypothesis testing. In: Bayesian Statistics—Bernardo JM, DeGroot MH, Lindley DV, Smith AFM, eds. (1980) Valencia, Spain: Valencian University Press. 605–647.

    Berry V, Gascuel O. On the interpretation of bootstrap trees: appropriate threshold of clade selection and induced gain. Mol Biol Evol (1996) 13:999–1011.[Abstract]

    Bourlat SJ, Juliusdottir T, Lowe CJ, Freeman R, et al. Deuterostome phylogeny reveals monophyletic chordates and the new phylum Xenoturbellida. Nature (2006) 444:85–88.[CrossRef][Medline]

    Buckley TR. Model misspecification and probabilistic tests of topology: evidence from empirical data sets. Syst Biol (2002) 51:509–523.[CrossRef][Web of Science][Medline]

    Copson ET. Asymptotic Expansions (1965) Cambridge, UK: Cambridge University Press.

    Cox DR. Principles of Statistical Inference (2006) Cambridge, UK: Cambridge University Press.

    Cummings MP, Handley SA, Myers DS, Reed DL, et al. Comparing bootstrap and posterior probability values in the four-taxon case. Syst Biol (2003) 52:477–487.[CrossRef][Web of Science][Medline]

    Davison AC. Statistical Models (2003) Cambridge, UK: Cambridge University Press.

    Dawid AP. The trouble with Bayes factors. Research Report 202. In: Department of Statistical Science (1999) University College London.

    Douady CJ, Delsuc F, Boucher Y, Doolittle WF, et al. Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol Biol Evol (2003) 20:248–254.[Abstract/Free Full Text]

    Drezner Z, Wesolowsky GO. On the computation of the bivariate normal integral. J Statist Comput Simul (1990) 35:101–107.[CrossRef]

    Efron B. R.A. Fisher in the 21st Century. Stat Sci (1998) 13:95–122.[CrossRef][Web of Science]

    Erixon P, Svennblad B, Britton T, Oxelman B. Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Syst Biol (2003) 52:665–673.[CrossRef][Web of Science][Medline]

    Good IJ. Lindley's paradox. J Am Stat Assoc (1982) 77:342.[CrossRef][Web of Science]

    Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika (1995) 82:711–732.[Abstract/Free Full Text]

    Hill ID. The normal integral. Appl Stat (1973) 22:424–427.[CrossRef]

    Huelsenbeck JP, Ronquist F. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics (2001) 17:754–755.[Abstract/Free Full Text]

    Huelsenbeck JP, Rannala B. Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Syst Biol (2004) 53:904–913.[CrossRef][Web of Science][Medline]

    Jeffreys H. Theory of Probability (1939) Oxford, UK: Clarendon Press.

    Jeffreys H. Theory of Probability (1961) Oxford, UK: Oxford University Press.

    Johnson NL, Kotz S, Balakrishnan N. Continuous Univariate Distributions (1994) Volume 1. New York: Wiley.

    Kolaczkowski B, Thornton JW. Is there a star tree paradox? Mol Biol Evol (2006) 23:1819–1823.[Abstract/Free Full Text]

    Lemmon AR, Moriarty EC. The importance of proper model assumption in Bayesian phylogenetics. Syst Biol (2004) 53:265–277.[CrossRef][Web of Science][Medline]

    Lewis PO, Holder MT, Holsinger KE. Polytomies and Bayesian phylogenetic inference. Syst Biol (2005) 54:241–253.[CrossRef][Web of Science][Medline]

    Li S, Pearl D, Doss H. Phylogenetic tree reconstruction using Markov chain Monte Carlo. J Am Statist Assoc (2000) 95:493–508.[CrossRef][Web of Science]

    Lindley DV. A statistical paradox. Biometrika (1957) 44:187–192.[Free Full Text]

    Lindley DV. Approximate Bayesian methods. In: Bayesian statistics—Bernardo JM, DeGroot MH, Lindley DV, Smith AFM, eds. (1980) Valencia, Spain: Valencian University Press. 223–237.

    Mau B, Newton MA. Phylogenetic inference for binary data on dendrograms using Markov chain Monte Carlo. J Computat Graph Stat (1997) 6:122–131.[CrossRef]

    Pearson ES. The choice of statistical tests illustrated on the interpretation of data classed in the 2 x 2 table. Biometrika (1947) 34:139–167.[Free Full Text]

    Press SJ. Subjective and Objective Bayesian Statitics (2003) New Jersey: John Wiley & Sons.

    Rannala B, Yang Z. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J Mol Evol (1996) 43:304–311.[Web of Science][Medline]

    Shafer G. Lindley's paradox. J Am Statist Assoc (1982) 77:325–334.[CrossRef][Web of Science]

    Silverman BW. Density Estimation for Statistics and Data Analysis (1986) London: Chapman and Hall.

    Simmons MP, Pickett KM, Miya M. How meaningful are Bayesian support values? Mol Biol Evol (2004) 21:188–199.[Abstract/Free Full Text]

    Steel M, Matsen FA. The Bayesian "star paradox" persists for long finite sequences. Mol Biol Evol (2007) 24:1075–1079.[Abstract/Free Full Text]

    Suzuki Y, Glazko GV, Nei M. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc Natl Acad Sci USA (2002) 99:16138–16143.[Abstract/Free Full Text]

    Tierney L, Kadane JB. Accurate approximations for posterior moments and marginal densities. J Am Stat Assoc (1986) 81:82–86.[CrossRef][Web of Science]

    Wolfram S. Mathematica 5 (2003) Cambridge, UK: Cambridge University Press.

    Yang Z. Complexity of the simplest phylogenetic estimation problem. Proc R Soc B: Biol Sci (2000) 267:109–116.[Medline]

    Yang Z. Computational Molecular Evolution (2006) Oxford, England: Oxford University Press.

    Yang Z, Rannala B. Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo Method. Mol Biol Evol (1997) 14:717–724.[Abstract]

    Yang Z, Rannala B. Branch-length prior influences Bayesian posterior probability of phylogeny. Syst Biol (2005) 54:455–470.[CrossRef][Web of Science][Medline]

    Yang Z, Goldman N, Friday AE. Maximum likelihood trees from DNA sequences: a peculiar statistical estimation problem. Syst Biol (1995) 44:384–399.

Accepted for publication April 18, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Syst BiolHome page
A. R. Lemmon, J. M. Brown, K. Stanger-Hall, and E. M. Lemmon
The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian Inference
Syst Biol, May 22, 2009; (2009) syp017v1.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. D. McKenna, A. S. Sequeira, A. E. Marvaldi, and B. D. Farrell
Temporal lags and overlap in the diversification of weevils and flowering plants
PNAS, April 28, 2009; 106(17): 7083 - 7088.
[Abstract] [Full Text] [PDF]


Home page
Phil Trans R Soc BHome page
Z. Yang
Empirical evaluation of a prior for Bayesian phylogenetic inference
Phil Trans R Soc B, December 27, 2008; 363(1512): 4031 - 4039.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. B. Prasad, M. W. Allard, NISC Comparative Sequencing Program, and E. D. Green
Confirming the Phylogeny of Mammals by Use of Large Comparative Sequence Data Sets
Mol. Biol. Evol., September 1, 2008; 25(9): 1795 - 1808.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
E. Susko
On the Distributions of Bootstrap Support and Posterior Distributions for a Star Tree
Syst Biol, August 1, 2008; 57(4): 602 - 612.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
M. Dohrmann, D. Janussen, J. Reitner, A. G. Collins, and G. Worheide
Phylogeny and Evolution of Glass Sponges (Porifera, Hexactinellida)
Syst Biol, June 1, 2008; 57(3): 388 - 405.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/8/1639    most recent
msm081v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yang, Z.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yang, Z.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?