MBE Advance Access originally published online on December 12, 2007
Molecular Biology and Evolution 2008 25(2):447-453; doi:10.1093/molbev/msm274
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Topology-Bayes versus Clade-Bayes in Phylogenetic Analysis

* Division of Invertebrate Zoology, American Museum of Natural History, NY
Department of Biology, University of Vermont
E-mail: wheeler{at}amnh.org.
| Abstract |
|---|
|
|
|---|
Several features of currently used Bayesian methods in phylogenetic analysis are discussed. The distinction between Clade-Bayes and Topology-Bayes is presented and illustrated with an empirical example. Three problems with Bayesian phylogenetic methods––exaggerated clade support, inconsistently biased priors, and the impossibility of hypothesis testing of cladograms––are shown to be the result of using a Clade-based Bayesian approach. Topology-based Bayesian methods do not share these shortcomings.
Key Words: Bayes phylogeny systematics support
| Introduction |
|---|
|
|
|---|
Bayesian methods in phylogenetics have a 30-year history, tracing back at least to Farris (1973
As an optimality criterion to choose among candidate topologies, posterior probability is a fine option, but current use (as in Huelsenbeck and Ronquist 2003
) does not follow this path. The entities whose posterior probability is most frequently estimated are clades. The set of clades with posterior probability >one-half is presented as the Bayesian hypothesis of phylogenetic topology. Indeed, many proponents of the methodology have asserted repeatedly that these 50% majority rule consensus topologies reflect the probability of the truth of clade identity (Huelsenbeck et al. 2002
), and hundreds of authors employing the methods have repeated this assertion as a justification for the methodological choice. Such an approach differs from that of examining the relative merits of alternative topologies directly, resulting in 3 problems: 1) exaggerated clade support, 2) inconsistently biased priors, and 3) the impossibility of topology hypothesis testing. Here, we discuss these issues and show that these problems with Bayesian phylogenetics are not inherent to Bayesianism per se but to the particular path that has been taken by the majority of the community. We show that the adoption of the Bayesian optimality position––supported by Rannala and Yang (1996)
, though not adopted by most practitioners––abrogates these problems.
| Topology-Bayes and Clade-Bayes |
|---|
|
|
|---|
Topology-Bayes
A Topology-Bayesian estimator of a phylogenetic hypothesis is that topology (Tt) which has the maximum posterior probability (assuming all other topologies have equal cost of error or "loss"). Here, we describe a tree T as composed of a set of vertices V and edges E:
|
| (1) |
Bayes theorem allows for the calculation of the probability of the hypothesis––here, the topology––given the data,
. This is desirable as phylogeneticists begin with data and usually seek the best estimate of the topology. If investigating the posterior probability of a topology
, given topological "prior" probabilities
, Bayes theorem can be written:
|
| (2) |
, the product of the prior probability of the topology and the probability of the data, given the topology. Therefore, if the topological priors are uniform, the Topology-Bayesian estimator will be proportional to the maximum likelihood of the topology given the data (i.e.,
(Edwards 1992
Clade-Bayes
The Clade-Bayesian estimator is that set Vc = {Vc,1, Vc,2, ....} of clades that have a posterior probability >one-half. Because any pair of clades from Vc cannot conflict (because, by definition, there is a positive probability that some tree contains both clades), the set Vc forms a tree, which we will denote Tc. This is the Clade-Bayes topology. Unlike Tt, Tc has no associated optimality value even though each clade in Tc does. Most current Bayesian phylogenetic analyses (e.g., MrBayes) produce Tc (Huelsenbeck and Ronquist 2003
). Consider the posterior probability clade on a topology, given data:
|
| (3) |
is maximal (of all clades, e.g.), because that value would be multiplied by its prior probability (which may be quite low under uniform topological priors), it may not correspond to the clade of maximum posterior probability. Simply put, the proportionality and relative ordering of likelihood and posterior probability are no longer guaranteed. In any given case, depending on the data and the particular prior employed, the clade of maximum likelihood may also be the clade of maximum posterior probability, but there is no requirement that this be so.
Given that Tt and Tc are different entities, they need not agree. In such a case, Tt and Tc would have clades in conflict and Tc would not represent the Topology-Bayes estimator or the maximum a posteriori probability (MAP) estimate of Rannala and Yang (1996)
. This could occur, for example, if several suboptimal (i.e., lower posterior probability) topologies share a group not found in the "best" tree. Below, we provide an example.
Example Comparison
The fact that topologies of maximum likelihood need not correspond to the Clade-Bayes tree has been considered recently (Svennblad et al. 2006
). But the fact that the Topology-Bayes approach and the Clade-Bayes approach can result in topological differences has received almost no attention. Here, we demonstrate this potential disparity using empirical data.
Consider a case of arthropod morphological data with 54 taxa and 303 characters (Giribet et al. 2001
). If we begin with uniform topological prior probabilities, then the Tt will be the maximum likelihood topology. Any model will suffice; here we use the no common mechansim (NCM) model of Tuffley and Steel (1997)
. Because MrBayes (Huelsenbeck and Ronquist 2003
) does not seek the optimal trees (see below), we calculated Tt (the Topology-Bayes or MAP estimate) using POY3 (Wheeler et al. 1996–2005
), which searches and saves all identified optimal solutions, given a criterion. The 7 characters treated as additive in the original analysis of Giribet et al. (2001)
are treated as nonadditive here. Tc was estimated using MrBayes (Huelsenbeck and Ronquist 2003
), implementing the same model of evolution. Searches were performed with the following options:
POY3: buildsperreplicate 10, replicates 10, treefuse, likelihood, likelihoodroundingmultiplier 10,000. This implements 10 random replicates with 10 Wagner builds per replicate followed by tree-bisection-and-regrafting (TBR) branch-swapping and tree fusing (Goloboff 1999
) within and between replicates; the NCM (Tuffley and Steel 1997
) model of evolution was employed, rounding likelihoods to 5 decimal places.
MrBayes: lset parsmodel = yes mcmc ngen = 10,000,000 samplefreq = 500 nchains = 4. This implements the NCM (Tuffley and Steel 1997
) model of evolution with 2 simultaneous runs of 4 chains and 10,000,000 generations each, saving every 500th topology visited to file.
Ten cladograms of –log likelihood 773.02908 were found by POY3. Their strict consensus
is shown in figure 1a. The Tc produced by MrBayes is shown in figure 1b (–log likelihood of the best binary resolution of this consensus cladogram is 773.38128). Overall, the 2 trees are quite similar. They differ, however, in several major taxonomic groupings. Foremost among these is that the Chelicerata are monophyletic in Tc and paraphyletic (pycnogonids basal) in Tt.
|
The 2 topologies are similar, but not identical. Implementation issues aside, this example clearly demonstrates a case where Tt
Tc. It is worth noting that the maximum posterior probability topology from MrBayes has a –log likelihood of 773.03, which differs from the score reported by POY3 (773.02908) only in rounding. MrBayes reported only 2 trees of this score; it may have visited the other 8 topologies of highest probability but because the program is designed to create Tc, it does not seek to save all optimal topologies. In fact, one of the 2 optimal topologies visited by MrBayes was found before the end of the burn-in in our second run (tree 1905000) and so would not even be included in a majority rule calculation of the postburn-in topologies. However, there would be no burn-in period and no reason to abandon any of the visited topologies, when searching for the optimal solution.
| Doubling Behavior: Clade-Bayes and Support |
|---|
|
|
|---|
One of the properties of all statistical methods is an increase in levels of support with a multiplication of identically distributed data. In other words, larger data sets with the same proportional balance of data in favor of, in opposition to, and indifferent to a hypothesis will assign higher support. Consider the data of table 1. Under NCM (Penny et al. 1994
|
|
|
| (4) |
would increase with n until arbitrarily close to 1 (eq. 5).
|
| (5) |
This same behavior will be observed in a bootstrap, jackknife analysis or other resampling approach to support. Similarly, the same behavior would be observed for any data, contrived or otherwise, that are duplicated identically, regardless of the model employed; our example is presented to permit exact calculations. Although reasonable on statistical grounds, even when support levels are very close to unity, there are still as many observations contradicting the grouping as supporting. The point here is that this behavior is an ineluctable outcome of data duplication. It is important to note that the"inflation" of support values described here is not related to the higher values seen in Bayesian over bootstrap or jackknife support (Cummings et al. 2003
; Erixon et al. 2003
; Simmons et al. 2004
), reported from Tc. It is also worth noting that if the Tt approach is adopted, the support inflation seen in Tc becomes moot.
| Priors: Uniform, Biased, and Empirical |
|---|
|
|
|---|
A central issue surrounding Bayesian techniques is the choice of appropriate prior probabilities. For the purposes here, we can divide these into 3 types: uniform, biased, and empirical. Uniform or ignorance priors are usually employed when there is no useful preexisting information on the entity to be estimated. Phylogenetic analysis tends to adhere to uniform topological priors, and discussions tend to rely on current evidence to draw conclusions in the admirable feeling that the investigator cannot say which hypotheses are more probable a priori. Biased priors attach greater or lesser initial probability to entities based on nonuniform distributions. In many cases of Bayesian analysis, this is not undesirable. It may be well known that processes follow certain distributions and profitable use can be made of them. In phylogenetic analysis, they are less well regarded because they are not based on biological information. Empirical priors are based, unsurprisingly, on previous experience, data, and knowledge. In general, empirical priors are unobjectionable because they allow the use of previous empirical results. Some strict Bayesians object to the use of data as priors when the data themselves are not, in fact, temporally and ontologically prior to the other data under consideration. This objection derives from the fundamental difference with frequentist statistics: the Bayesian view that prior events predict future events. We note this strict interpretation here only to point out that as phylogeneticists begin using empirical data as prior assertions, this philosophical objection may loom, especially if the prior data did not occur first, even if they were observed first and thus formed the prior phylogenetic viewpoint (as would be the case, e.g., if morphology was used as priors for molecular data; morphology is not temporally antecedent to DNA and thus would violate this strict Bayesian interpretation of appropriate prior data).
Priors are attached to the entities to be estimated and hence play different roles for Tt and Tc. As mentioned above, phylogenetic Bayesian methods generally employ uniform priors; the most commonly used is the "proportional-to-distinguishable-arrangements" distribution (Rosen 1978
). This is straightforward with Tt because each topology (as in eq. 3) can be assigned the inverse of number of topologies. This cannot be done for Tc.
Steel and Pickett (2006)
have shown that uniform priors cannot be constructed for clades (Vc,i). In short, clades of size 2 or n – 2 (for n taxa) will have higher probabilities than those of intermediate size. This can result in huge prior disparities (orders of magnitude) among clades (Pickett and Randle 2005
). In absence of any data, some groups will be favored over others. This is not a problem that occurs with Topology-Bayesian analysis (Tt). The problem only arises when topological priors (and their resultant clade priors) are used to estimate clade posteriors, as in Clade-Bayesian (Tc) analyses.
Empirical priors on topologies are an underexplored area. Wheeler (1991)
tried to do this using morphological data to create priors that were then combined with likelihoods based on molecular data. Using an explicit model of evolution, we can employ this approach using the probability of a topology given a set of morphological data to approximate an empirical prior (eq. 6). Of course, any model of evolution might be invoked for the data that yield the topology prior. Here, we employ NCM.
|
| (6) |
Arthropod Example of Empirical Priors
If we use the estimation of empirical priors of equation (6), we can extend this with likelihoods based on molecular data (Dmol) for the same taxa to:
|
| (7) |
, we need only maximize the numerator of equation (7) (the denominator is again a constant). As
is proportional to
, the topology with greatest posterior probability will be that which maximizes:
|
| (8) |
classes and invariant sites (
) parameters estimated from the sequence data. Morphological data analyzed as above. The total –log likelihood reported by POY3 was 115,417.38.
|
| Hypothesis Testing |
|---|
|
|
|---|
The central act of phylogenetic analysis is establishing the relative merits of 2 hypotheses. This can be done on a variety of grounds (= optimality criteria) and as long as the comparisons are transitive, a best solution can be found. Parsimony, likelihood, and Topology-Bayes all do this by default. In each case, an optimality value (cost, likelihood, and posterior probability) is assigned to each cladogram and reported by the investigator. This value is used to compare and test pairs of hypotheses. No such comparison can made for Clade-Bayes (Tc) trees in the form that they are usually reported. Although it is true that any tree-shaped object upon which characters are plotted can be assigned an optimality score (and thus, the posterior probability of any given Clade-Bayes tree could be calculated as a Topology-Bayes hypothesis, but therefore abandoning the Clade-Bayes approach), investigators rarely, if ever, calculate or report such optimality scores. As such, few, if any of the reported Clade-Bayes trees from the literature can be compared with subsequent analyses of the same data, which may give different results. Thus, it is impossible to say that any Clade-Bayes topology is superior or inferior to any other, unless subsequent investigators compute the optimality score of the Clade-Bayes trees as topologies. As with jackknife or bootstrap trees, the strength of clade support is presented by the investigator, but not the cladogram optimality. It is also worth noting that any tree that is less resolved than the optimal binary tree––whether Clade-Bayes, jackknife, bootstrap, or other consensus tree––is most likely less optimal (in this case lower posterior probability, but strictly they could be equal) because the consensus is most likely due to character conflict. Hence, Clade-Bayes trees may perhaps be regarded as statements of support but not as best-supported scientific hypotheses of phylogenetic relationships.
Bayesian methods can have a place in systematic analysis, but this position must be based on the relative quality of topologies, not their constituent parts. This requires the use of the Topology-Bayes approach advocated here.
| Acknowledgements |
|---|
|
|
|---|
We would like to acknowledge the important influence of discussions with Andrés Varón in developing this manuscript and helpful criticism of Mike Steel, editor Barbara Holland, and 2 anonymous reviewers.
| Footnotes |
|---|
Barbara Holland, Associate Editor
| References |
|---|
|
|
|---|
Cummings MP, Handley SA, Myers DS, Reed DL, Rokas A, Winka K. Comparing bootstrap and posterior probability values in the four-taxon case. Syst Biol (2003) 52:477–487.[CrossRef][Web of Science][Medline]
Edwards A. Likelihood (1992) Baltimore (MD): Johns Hopkins University Press.
Erixon P, Svennblad B, Britton T, Oxelman B. Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. systematic biology. Syst Biol (2003) 52:665–673.[CrossRef][Web of Science][Medline]
Farris JS. A probability model for inferring evolutionary trees. Syst Zool (1973) 22:250–256.
Felsenstein J. Phylip (phylogeny inference package) version 3.6. Distributed by the author (2004) Seattle (WA): Department of Genome Sciences, University of Washington.
Giribet G, Edgecombe GD, Wheeler WC. Arthropod phylogeny based on eight molecular loci and morphology. Nature (2001) 413:157–161.[CrossRef][Medline]
Goloboff P. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics (1999) 15:415–428.[CrossRef][Web of Science]
Huelsenbeck JP, Larget B, Miller RE, Ronquist F. Potential applications and pitfalls of Bayesian inference of phylogeny. Syst Biol (2002) 51:673–688.[CrossRef][Web of Science][Medline]
Huelsenbeck JP, Ronquist F. MrBayes: Bayesian inference of phylogeny. 3.0 ed [Internet] (2003) documentation. http://mrbayes.csit.fsu.edu/.
Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP. Bayesian inference of phylogeny and its impact on evolutionary biology. Science (2001) 294:2310–2314.
Penny D, Lockhart PJ, Steel MA, Hendy MD. The role of models in reconstructing evolutionary trees. In: Models in phylogeny reconstruction. Vol. 52 of Systematics Association—Scotland RW, Siebert DJ, Williams DM, eds. (1994) Oxford: Clarendon Press. 211–230.
Pickett KM, Randle CP. Strange bayes indeed: uniform topological priors. Mol Phylogenet Evol (2005) 34:203–211.[CrossRef][Web of Science][Medline]
Rannala B, Yang Z. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J Mol Evol (1996) 43:304–311.[Web of Science][Medline]
Rosen DE. Vicariant patterns and historical explanation in biogeography. Syst Zool (1978) 27:159–188.
Simmons M, Pickett K, Miya M. How meaningful are Bayesian support values? Mol Biol Evol (2004) 21:188–199.
Steel M, Penny D. Parsimony, likelihood, and the role of models in molecular phylogenetics. Mol Biol Evol (2000) 17:839–850.
Steel M, Pickett KM. On the imposibility of uniform priors on clades. Mol Phylogenet Evol (2006) 39:585–586.[CrossRef][Web of Science][Medline]
Svennblad B, Erixon P, Oxelman B, Britton T. Fundamental differences between the methods of maximum likelihood and maximum posterior probability in phylogenetics. Syst Biol (2006) 55:116–121.[CrossRef][Web of Science][Medline]
Tuffley C, Steel M. Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull Math Biol (1997) 59:581–607.[Web of Science][Medline]
Wheeler WC. Congruence among data sets: a Bayesian approach. In: Phylogenetic analysis of DNA sequences—Miyamoto MM, Cracraft J, eds. (1991) London: Oxford University Press. 334–346.
Wheeler WC. Optimization alignment: the end of multiple sequence alignment in phylogenetics? Cladistics (1996) 12:1–9.[CrossRef][Web of Science]
Wheeler WC. Dynamic homology and the likelihood criterion. Cladistics (2006) 22:157–170.[CrossRef][Web of Science]
Wheeler WC, Gladstein DS, De Laet J. POY version 3.0 [Internet] (1996–2005) New York: American Museum of Natural History. Available from: http://research.amnh.org/scicomp/projects/poy.php documentation (current version 3.0.11). documentation by D. Janies and W.C. Wheeler. commandline documentation by J. De Laet and W.C. Wheeler.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Sukumaran and C. W. Linkem Choice of Topology Estimators in Bayesian Phylogenetic Analysis Mol. Biol. Evol., January 1, 2009; 26(1): 1 - 3. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yang Empirical evaluation of a prior for Bayesian phylogenetic inference Phil Trans R Soc B, December 27, 2008; 363(1512): 4031 - 4039. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. Holder, J. Sukumaran, and P. O. Lewis A Justification for Reporting the Majority-Rule Consensus Tree in Bayesian Phylogenetics Syst Biol, October 1, 2008; 57(5): 814 - 821. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





