MBE Advance Access originally published online on July 17, 2007
Molecular Biology and Evolution 2007 24(9):2108-2118; doi:10.1093/molbev/msm141
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Effects of Branch Length Uncertainty on Bayesian Posterior Probabilities for Phylogenetic Hypotheses
Center for Ecology and Evolutionary Biology, University of Oregon
E-mail: joet{at}uoregon.edu.
| Abstract |
|---|
|
|
|---|
In Bayesian phylogenetics, confidence in evolutionary relationships is expressed as posterior probability—the probability that a tree or clade is true given the data, evolutionary model, and prior assumptions about model parameters. Model parameters, such as branch lengths, are never known in advance; Bayesian methods incorporate this uncertainty by integrating over a range of plausible values given an assumed prior probability distribution for each parameter. Little is known about the effects of integrating over branch length uncertainty on posterior probabilities when different priors are assumed. Here, we show that integrating over uncertainty using a wide range of typical prior assumptions strongly affects posterior probabilities, causing them to deviate from those that would be inferred if branch lengths were known in advance; only when there is no uncertainty to integrate over does the average posterior probability of a group of trees accurately predict the proportion of correct trees in the group. The pattern of branch lengths on the true tree determines whether integrating over uncertainty pushes posterior probabilities upward or downward. The magnitude of the effect depends on the specific prior distributions used and the length of the sequences analyzed. Under realistic conditions, however, even extraordinarily long sequences are not enough to prevent frequent inference of incorrect clades with strong support. We found that across a range of conditions, diffuse priors—either flat or exponential distributions with moderate to large means—provide more reliable inferences than small-mean exponential priors. An empirical Bayes approach that fixes branch lengths at their maximum likelihood estimates yields posterior probabilities that more closely match those that would be inferred if the true branch lengths were known in advance and reduces the rate of strongly supported false inferences compared with fully Bayesian integration.
Key Words: Bayesian phylogenetics posterior probability branch length prior empirical Bayes
| Introduction |
|---|
|
|
|---|
A central goal of statistics is to help us express how certain we should be when we draw an inference from evidence. Phylogenies provide the framework for all valid comparative biology, so reliable measures of statistical confidence in evolutionary trees have long been sought. Bayesian phylogenetics (Rannala and Yang 1996
Concerns have been raised that posterior probabilities may not provide reliable indicators of statistical confidence. Although a number of studies support the general reliability of the Bayesian approach—while cautioning that results can be sensitive to model misspecification (Buckley 2002
; Wilcox et al. 2002
; Alfaro et al. 2003
; Erixon et al. 2003
; Huelsenbeck and Rannala 2004
; Lemmon and Moriarty 2004
; Mar et al. 2005
)—others have directly challenged these findings, concluding that posterior probabilities are regularly "overcredible" and produce high rates of false inferences even when the correct model is used (Suzuki et al. 2002
; Cummings et al. 2003
; Douady et al. 2003
; Misawa and Nei 2003
; Simmons et al. 2004
; Taylor and Piel 2004
; Lewis et al. 2005
). As a result of these conflicting findings, the confidence we should have in phylogenies inferred using Bayesian techniques is unclear. Further, if posterior probabilities are unreliable indicators of statistical confidence even when the correct model is used, the reasons for this behavior are not known.
Much of the controversy surrounding the reliability of posterior probabilities in phylogenetics may stem from a misunderstanding of what posterior probabilities mean and how they should be interpreted. Bayesian posterior probability represents the degree of subjective belief a rational agent should have in a hypothesis, given the data and his or her prior beliefs not only about the hypothesis in question but also about the values of nuisance parameters. Numerous studies have compared posterior probabilities on trees with the frequentist chance that those trees are correct—the fraction of inferred trees, over a large number of analyses, that are true (Wilcox et al. 2002
; Erixon et al. 2003
; Misawa and Nei 2003
; Huelsenbeck and Rannala 2004
; Simmons et al. 2004
; Taylor and Piel 2004
; Yang and Rannala 2005
). Posterior probabilities are not necessarily expected to match the long-run proportion of correct inferences, because 1) posterior probabilities are conditioned on prior assumptions that may be incorrect or incomplete, whereas the proportion of correct trees is not, and 2) posterior probabilities are calculated directly from the data at hand and do not require replication, whereas calculating the percent of correctly inferred trees requires a large number of inferences to be made from different data sets.
Only under restricted conditions is the average posterior probability of a group of inferences expected to equal the proportion of correct inferences in the group. The first condition is replication. Although evolutionary history happened once, computer simulations allow multiple replicate data sets to be generated from any conceivable set of evolutionary conditions, so the proportion of correct inferences can be calculated. The second condition is that the chance of choosing each set of evolutionary parameter values to generate data must be known in advance and used as prior information (i.e., the "true priors" must be used). When both conditions hold, the average posterior probability of a group of inferences is equal to the proportion of those inferences that are correct. This correspondence follows directly from Bayes' theorem (see Supplementary Material online) and has been empirically demonstrated using phylogenetic simulations (Huelsenbeck and Rannala 2004
; Yang and Rannala 2005
). When the true priors are known, posterior probability can therefore be safely interpreted as the frequentist probability that a hypothesis is correct.
In real analyses, the true values of model parameters are never known in advance and cannot be used as priors. Bayesian methods incorporate this uncertainty by integrating over many possible parameter values, weighted by prior beliefs about the probability of each. As a result, the posterior probability calculated by integration over parameter values may differ from the frequentist probability that a tree is correct. Little is known about the effects of different prior assumptions on posterior probabilities. In the only study to date addressing this question, Yang and Rannala (2005)
simulated binary sequence data on rooted 3-taxon trees assuming a molecular clock with branch lengths drawn from exponential distributions. They analyzed these data assuming separate exponential priors for terminal and internal branch lengths and compared the average posterior probability of a group of trees with the proportion correct. When the same distributions used to generate data were used as priors, posterior probability predicted the proportion correct. However, when the true distribution was used as a prior on terminal branches but the mean of the prior distribution on the internal branch length was greater than the true mean, posterior probabilities were higher than the proportion of correct trees; when the mean of the internal branch length prior was less than the true mean, posterior probabilities were lower than the fraction correct.
The experiments of Yang and Rannala (Y&R) established that the choice of branch length priors can affect posterior probabilities and that the special correspondence between posterior probability and proportion correct is not always robust to the use of certain priors. Several key questions remain unresolved, however. First, Y&R considered only a single pattern of branch lengths; different branch length patterns might interact with prior assumptions to produce different effects. Second, although Y&R examined various priors for the internal branch, a separate prior—always the true distribution used to simulate data—was independently assigned to terminal branches. Most widely available Bayesian phylogenetics software use a single prior distribution for all branches on the tree; how the kinds of branch length priors used in most real analyses affect posterior probabilities is unknown. Third, it has been common to use a uniform prior distribution with a large upper bound on branch lengths to represent ignorance about this parameter; because such a prior will usually overestimate mean branch lengths, Y&R predicted that flat priors would produce excessively high posterior probabilities. Whether flat branch length priors actually yield high posterior probabilities was not tested, however. Finally, the simulations of Y&R represent a peculiar situation in which the phylogenetic trees and branch lengths on which sequences were simulated were drawn from sampling distributions. In reality, there is a single-correct evolutionary history; how different prior assumptions affect posterior probabilities under such circumstances is not known.
Here, we use a simulation-based approach to determine how integrating over uncertainty affects posterior probabilities using a range of prior assumptions available in commonly used software packages and under a variety of evolutionary conditions.
| Materials and Methods |
|---|
|
|
|---|
Bayesian Analyses
Posterior probabilities were estimated by Markov Chain Monte Carlo using MrBayes v3.1 (Ronquist and Huelsenbeck 2003
We also conducted Bayesian analyses using 2 approaches that do not require integrating over branch length uncertainty. First, we used an empirical Bayes approach that places prior probability 1.0 on the maximum likelihood branch lengths for each possible topology, which were estimated using PAUP* v4.0b10 (Swofford 2002
). Posterior probabilities were then calculated directly from Bayes' theorem using these branch lengths. Second, for Bayesian simulations (see below), we also used the true point priors on branch lengths, which place prior probability 1.0 on the branch lengths actually used to simulate data.
Simulations
We first performed "Bayesian simulations" (Huelsenbeck and Rannala 2004
; Yang and Rannala 2005
) using 4-taxon phylogenies with fixed branch lengths (terminals 0.5 substitutions/site, internal 0.01). At each sequence length (100, 1000, and 10,000 nt), 500 replicates were simulated using the JC69 model and a randomly selected topology for each replicate.
To assess the effects of different branch length patterns on inferred posterior probabilities, we simulated 500 replicate data sets of 100 or 10,000 nt on 4-taxon topologies with internal branch length 0.01 and 6 different terminal branch length combinations: 1) all short branches (0.01), 2) 1 long branch (0.75) and three short, 3) 3 long and one short, 4) all long branches, 5) inverse-Felsenstein-zone lengths, with 2 long sister branches and 2 short sister branches, and 6) Felsenstein-zone branch lengths, with 2 long nonsister branches and 2 short nonsister branches.
To assess the impact of typical branch length prior assumptions on clade probabilities under more realistic conditions, we simulated 100 data sets of various sequence lengths (1000, 10,000, 25,000, and 50,000) using parameter values drawn from an analysis of real sequence data (Murphy et al. 2001
). We estimated the tree topology, branch lengths, and parameters of the General Time Reversible model with invariant sites and gamma-distributed rate variation (GTR+I+G) by BMCMC using the original nucleotide data and then used these conditions to simulate replicate data sets.
Comparing Posterior Probabilities
For each branch length prior, we collected posterior probabilities on trees into 10 equally sized bins and compared the average posterior probability of each bin with the proportion of correct trees in the bin; these values should be equal if nuisance parameters are known with certainty (Huelsenbeck and Rannala 2004
; Yang and Rannala 2005
). Bins with fewer than 20 trees were excluded to avoid stochastic error in estimating the proportion correct. We also considered inferred trees with posterior probability
0.95 as strongly supported and calculated the proportion of replicate data sets producing incorrect inferences with strong support using each prior distribution.
| Results |
|---|
|
|
|---|
Effects of Integrating over Uncertainty Depend on Prior Assumptions and Sequence Length
We first performed "Bayesian simulations" using challenging 4-taxon phylogenies with equal terminal branch lengths and a short internal branch. Data were analyzed using flat priors with various upper bounds and exponential priors with various means. To isolate the effects of integrating over prior uncertainty, we compared these results with 2 kinds of control experiments in which prior distributions that do not require integration were applied. First, we used the true point prior distribution, which places prior probability 1.0 on the actual branch lengths used to simulate the data, producing posterior probabilities given perfect prior knowledge. Second, we used an empirical Bayes approach that places prior probability 1.0 on the maximum likelihood branch length estimates obtained from the data. Empirical Bayes analysis allows us to separate the effects of integrating over uncertainty from those caused by a lack of perfect prior knowledge.
We found that integrating over uncertainty has a strong effect on posterior probabilities compared to control methods in which integration is not employed. When the true values of model parameters were known in advance, obviating the need to integrate over uncertainty, the average posterior probability of a group of hypotheses equalled the fraction of correct hypotheses in the group (fig. 1A), as predicted by Bayes' theorem, and the rate of false inferences with high posterior probability was close to zero, irrespective of sequence length (fig. 1D).
|
When branch lengths were not known in advance but were estimated from the data using maximum likelihood, posterior probabilities deviated slightly from the fraction of correct inferences. When sequences were very short, the empirical Bayes technique produced average posterior probabilities for the best-supported trees (posterior probability >1/3) that were higher than the proportion of those trees that were correct; still, posteriors were almost never >0.70. With longer sequences, posterior probabilities were very close to those inferred when the true branch lengths were known in advance. False inferences with strong support were rare at all sequence lengths (fig. 1A and D).
In contrast, when uncertainty was integrated over using typical prior distributions, posterior probabilities deviated strongly from the fraction of correct inferences. The magnitude of this effect was determined by sequence length and the specific prior distribution used (fig. 1B and C). When sequences were of short or moderate length, all prior distributions produced average posterior probabilities for the best-supported tree that were considerably greater than the proportion of correct trees. When long sequences were analyzed, diffuse branch length priors—uniform distributions or exponential distributions with moderate to large means—produced posterior probabilities that closely matched those inferred when branch lengths were known in advance. Uniform branch length priors with upper bounds from 1 to 100 all produced similar posterior probabilities (fig. 1C). In contrast, even with long sequences, small-mean exponential distributions (10–5–10–2) produced average posterior probabilities considerably greater than the proportion of correct trees (fig. 1B).
Integrating over uncertainty also increased the rate of strongly supported false inferences (fig. 1D). At all sequence lengths, exponential priors and flat priors produced false inferences more frequently than either of the control priors that do not require integration. Exponential priors performed more poorly in this regard than uniform priors. However, the frequency of such inferences was high only when exponential priors with small means were used. For any given prior, the proportion of false inferences with strong support declined as sequence length increased. Even with 10,000 sites, however, the small-mean exponential priors continued to produce high posterior probabilities for false trees at appreciable rates.
Branch Length Pattern Determines the Effects of Uncertainty
To examine how different evolutionary conditions might effect posterior probabilities, we investigated the role of the true tree's branch length pattern in modulating the effect of integrating over uncertainty. We simulated data on 4-taxon trees with all possible combinations of long/short terminal branches. We analyzed these data using a flat branch length prior (U(0, 10)), the default exponential prior used in MrBayes v3.1 (µ = 0.1), a small-mean exponential prior (µ = 10–5), or the empirical Bayes approach that uses fixed maximum likelihood estimates.
We found that the pattern of branch lengths on the true phylogeny determines both the direction and the magnitude of the effect that prior uncertainty has on posterior probabilities (figs. 2–4![]()
). This effect appears to arise from the amount and structure of convergent evolution produced under various branch length patterns. When 1 or none of the 4 terminal branches was long—making convergent state patterns unlikely—inferred posterior probabilities were close to the proportion of correct inferences, and strongly supported false inferences were rare (fig. 2). When there was ample opportunity for convergent evolution but no expected structure to the convergence—that is, when 3 or 4 of the terminal branches were long—average posterior probabilities were higher than the proportion of correct trees, and the frequency of strongly supported false inferences increased (fig. 3). In contrast, when convergence was structured to favor a particular tree—as on trees with 2 long and 2 short branches—posterior probabilities were lower than the proportion of correct trees and the rate of false inferences was dependent on the relationship between taxa with long terminal branches (fig. 4). When the true tree was in the inverse-Felsenstein zone (2 sister long branches), the true tree was always recovered, but posterior probability was reduced (fig. 4A). When the true tree was in the Felsenstein zone (2 nonsister long branches), integrating over incorrect branch lengths reduced support for the correct tree and inflated support for an incorrect tree (fig. 4C–D).
|
|
|
Several of the inferences drawn from the experiments in figure 1 applied across branch length patterns. First, the effect of integrating over uncertainty was again strongest when short sequences were analyzed. Presumably, longer sequences cause the likelihood function over branch lengths to be more narrowly peaked around the true values, so integrating over that function more closely approximates knowing the true values in advance.
Second, the small-mean exponential prior was particularly problematic. This prior consistently produced more radical deviations from the proportion of correct inferences than the flat or moderate-mean exponential priors did. Under virtually all conditions, the frequency of strongly supported false inferences was notably higher when the small-mean exponential prior was used than when the flat and moderate-mean exponential priors were applied. Whereas analyzing longer sequences mitigated this effect for the flat and moderate priors, adding more data had a much weaker beneficial effect on posteriors using the small-mean exponential prior. For example, when long sequences were simulated along a tree with 4 long terminal branches and analyzed using the small-mean exponential prior, only 50% of trees with posterior probability >0.94 were correct, whereas 65% of such trees were correct when more diffuse priors were used (fig. 3C–D). On Felsenstein-zone trees, small-mean exponential priors led to a strong long-branch attraction artifact, with the incorrect tree being inferred with high posterior probability in virtually all replicates, and this effect was most severe with long sequences (fig. 4C–D). Flat and moderate exponential priors suffered from a much weaker bias; the wrong tree was occasionally recovered with strong support, but only from short sequences.
Finally, empirical Bayes analysis produced posterior probabilities that more closely matched the chance that an inference was correct than any of the nonpoint priors examined. When 3 or 4 terminal branches were long—conditions that caused posterior probabilities to be higher than the proportion of correct trees—posterior probabilities calculated using the empirical Bayes approach were closer to the proportion of correct trees than any of the nonpoint prior distributions (fig. 3). Even under challenging Felsenstein-zone conditions, the empirical Bayes approach produced posterior probabilities that closely matched the proportion of correct inferences (fig. 4C–D). The rate of strongly supported false trees using empirical Bayes was consistently lower than when branch length uncertainty was integrated over using any of the priors. These results indicate that it is integration over parameter uncertainty—rather than lack of perfect prior knowledge—that is the chief cause of the observed effects on posterior probabilities.
Uncertainty Affects Posterior Probabilities under Realistic Conditions
To examine the potential effects of branch length uncertainty under more realistic conditions, we simulated data using parameter values inferred from the placental mammal data of Murphy et al. (2001)
and analyzed them using BMCMC with various prior distributions.
At all sequence lengths, integrating over branch length uncertainty using any of the priors caused posterior probabilities to be skewed upward compared with those that would be inferred if branch lengths were known. With increasing sequence length, posterior probabilities for inferred clades converged toward 1.0 using any of the priors, but a large fraction of these strongly supported clades were incorrect (fig. 5). Even with very long sequences (50,000 nt), all clades inferred using flat and moderate-mean exponential priors had posterior probability 1.0, but about 10 % of these were incorrect. The small-mean exponential prior performed similarly, but an additional group of weakly supported clades were also recovered, and virtually all of these were incorrect. These results suggest that for difficult real-world problems, even extraordinarily long sequences are not sufficient for Bayesian methods that integrate over uncertainty to reliably recover the correct tree and avoid recovering false clades with high support.
|
| Discussion |
|---|
|
|
|---|
Bayes theorem guarantees that posterior probabilities would equal the proportion of correct inferences if branch lengths and other nuisance parameters were known a priori. In real analyses, these values can never be known with certainty. Our experiments show that integrating over these parameters using common "uninformative" priors can cause posterior probabilities to deviate radically from this equivalence. The direction of the deviation—either upward or downward—is determined by the pattern of branch lengths on the tree; the magnitude depends largely on sequence length and prior assumptions. Under realistic conditions derived from empirical sequence analysis, posterior probabilities calculated using typical prior assumptions can produce spurious support for incorrect phylogenies even when the evolutionary model is otherwise correct and sequences are very long.
Our results reinforce that, in practice, Bayesian posterior probabilities should not be misinterpreted as the unconditional probability that a hypothesis is true. In no way does this conclusion suggest that posterior probabilities are technically incorrect; Bayes' theorem guarantees that the posterior probability of a tree is no more and no less than the probability that the tree is correct given the data, an evolutionary model, and prior distributions over trees and model parameters. Posterior probability accurately expresses the quantitative degree of belief in a hypothesis that a rational being should have, given a set of explicit starting assumptions and following analysis of a set of data. Posterior probabilities must be interpreted strictly in this conditional sense.
Numerous previous analyses have suggested that posterior probabilities are regularly "overcredible," leading to inflated estimates of statistical confidence. These studies, however, looked at only a small subset of possible tree shapes (Suzuki et al. 2002
; Cummings et al. 2003
; Douady et al. 2003
; Misawa and Nei 2003
; Simmons et al. 2004
; Taylor and Piel 2004
; Lewis et al. 2005
). Our experiments show that with some branch length combinations, integrating over uncertainty does cause posterior probabilities to exceed the frequentest chance that a tree or clade is correct. Other branch length combinations, however, produce posterior probabilities that are lower than the chance an inference is true. As a result, there can be no general post hoc "correction"–such as adjusting posterior probabilities downward–that can make posterior probabilities approximate the frequentist chance that a hypothesis is true.
Most phylogenetic analyses to date have used diffuse priors to indicate a lack of prior knowledge about evolutionary parameters like branch lengths. Our results indicate that various branch length distributions affect posterior probabilities in very similar ways, so long as the prior is sufficiently vague. Exponential priors with moderate to large means and flat priors with various upper bounds all produced nearly identical posterior probabilities across a range of problems. Although concern has been raised that the upper bound on the uniform distribution is arbitrary (Felsenstein 2004
), we found no evidence that choosing different uniform distributions within a reasonable range appreciably affects posterior probabilities.
More informative priors have the potential to improve estimates of statistical confidence, but only if they accurately reflect prior beliefs about the true evolutionary conditions. We found that small-mean exponential priors, when applied to all branch lengths on a tree, severely underestimate the probability of convergence on long branches, resulting in radically increased posterior probabilities when many branches are long. A partitioned prior like the one used by Y&R—with a small-mean distribution applied to internal branches and a large mean distribution for terminal branches—increases the relative prior probability of convergence, thus tending to decrease posterior probabilities (Yang and Rannala 2005
). This type of prior may be appropriate when examining evolutionary radiations or other situations in which short internal and long terminal branches are expected given prior information. If used generally with the goal of causing posterior probabilities to more closely approximate the frequentist chance that inferences are correct, however, this prior is unlikely to be successful: it would reduce the high posterior probabilities induced by some branch length patterns but exacerbate the lower posterior probabilities caused by others.
When model parameters are expected to follow a particular kind of prior distribution, but the specific parameter values describing that distribution are unknown, uncertainty about the prior itself can be integrated over using a "hyperprior" (Berger et al. 2005
). In our experiments using simulated mammalian sequence data, however, exponential priors with a wide range of means all produced posterior probabilities greater than the proportion of correct inferences, presumably because the true branch lengths are not exponentially distributed. These results suggest that hyperpriors may improve performance only when the family of prior distributions includes a reasonable approximation of the true distribution of branch lengths across the tree. The development of prior distributions that are flexible enough to match prior beliefs concerning real phylogenetic problems merits further research.
Compared with a fully Bayesian approach, the empirical Bayes method we investigated yielded more reliable results—a lower rate of strongly supported false inferences and posterior probabilities that more closely approximate those that would be derived with perfect prior knowledge. In nonphylogenetic applications, simple empirical Bayes approaches that do not account for uncertainty can be less reliable than fully Bayesian methods, producing overly narrow confidence intervals that do not account for stochastic error in parameter estimates (Carlin and Gelfand 1990
). Phylogenetic inference differs from classical domains of statistical analysis in numerous ways, however, so classical results are not always expected to hold (Yang et al. 1995
). Due to the hierarchical structure of phylogenetic trees, incorrect values of nuisance parameters—branch lengths in particular—can cause very strong biases in favor of particular topologies, because incorrect parameter values systematically misinterpret convergent state patterns as due to common descent or vice versa. Model-based phylogenetic methods are unbiased only when the probability of convergence is accurately assessed, a condition strictly fulfilled only when the model and its parameters, including branch lengths, are correct (Rogers 2001
). Our results confirm that when the true branch lengths are known in advance, the probability of convergence is accurately assessed and the likelihoods calculated for each tree accurately reflect the relative probabilities that each topology would have generated the data. The empirical Bayes technique we used estimates branch lengths from the data using maximum likelihood and approximates, to the extent those estimates are accurate, Bayesian analysis if the true lengths were known in advance. Our experiments indicate that these branch length estimates are accurate enough to yield posterior probabilities almost identical to those calculated with perfect prior knowledge, except when sequences were extremely short.
In contrast, the fully Bayesian approach integrates branch lengths over a diffuse distribution of values, virtually all of which are incorrect. Integration over incorrect branch lengths causes the probability of convergence to be systematically under- or overestimated, depending on the topology. As a result, the likelihood of one tree relative to the others is increased, skewing posterior probabilities up- or downward, depending on the phylogeny favored by convergent patterns. Our experiments suggest that the cost of ignoring uncertainty using empirical Bayes—stochastic error in estimating parameter values—is much smaller than the cost of incorporating it using Bayesian integration, which introduces strong biases that even very long sequences may not overcome. Further investigation of the performance of empirical Bayes strategies in phylogenetics is therefore warranted.
In addition to empirical Bayes analysis, other strategies to characterize evidentiary support without conditioning on prior beliefs about nuisance parameters—or relying on bootstrapping, which is computationally intensive and can also be biased (Hillis and Bull 1993
; Sitnikova et al. 1995
)—are available. These include procedures based on evaluating the likelihood ratio of the best tree with a clade versus the best tree without it (Edwards 1992
; Anisimova and Gascuel 2006
). Understanding the statistical properties of these and other confidence measures under a variety of conditions deserves further study. Because no single measure is likely to provide a complete and accurate estimate of statistical confidence under all evolutionary conditions, a careful and critical application of a variety of techniques—each evaluated in light of a detailed understanding of its properties—will provide the most robust and meaningful assessments of confidence in phylogenetic hypotheses.
| Supplementary Material |
|---|
|
|
|---|
Supplementary appendix is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank Iain Pardoe, Patrick Phillips, and Steve Proulx for fruitful discussions and an anonymous reviewer for many helpful comments. Supported by National Science Foundation (NSF) DEB-0516530, National Institutes of Health GM62351, NSF–Integrated Graduate Education and Research Training grant DGE-9972830, and an Alfred Sloan Research Foundation Fellowship to J.W.T.
| Footnotes |
|---|
Hervé Philippe, Associate Editor
| References |
|---|
|
|
|---|
Alfaro ME, Holder MT. The posterior and the prior in Bayesian phylogenetics. Annu Rev Ecol Syst (2006) 37:19–42.[CrossRef][Web of Science]
Alfaro ME, Zoller S, Lutzoni F. Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. Mol Biol Evol (2003) 20(2):255–266.
Anisimova M, Gascuel O. Approximate likelihood ratio test for branches: a fast, accurate and powerful alternative. Syst Biol (2006) 55(4):539–552.[CrossRef][Web of Science][Medline]
Berger JO, Strawderman W, Tang D. Posterior propriety and admissibility of hyperpriors in normal hierarchical models. Ann Stat (2005) 33(2):606–646.[CrossRef]
Buckley TR. Model misspecification and probabilistic tests of topology: evidence from empirical data sets. Syst Biol (2002) 51(3):509–523.[CrossRef][Web of Science][Medline]
Carlin BP, Gelfand AE. Approaches for empirical Bayesian confidence intervals. J Am Stat Assoc (1990) 85(409):105–114.[CrossRef][Web of Science]
Cummings MP, Handley SA, Myers DS, Reed DL, Rokas A, Winka K. Comparing bootstrap and posterior probability values in the four-taxon case. Syst Biol (2003) 52(4):477–487.
Douady CJ, Delsuc F, Boucher Y, Doolittle WF, Douzery EJP. Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol Biol Evol (2003) 20(2):248–254.
Edwards AWF. Likelihood (1992) Baltimore (MD): Johns Hopkins University Press.
Erixon P, Svennbald B, Britton T, Oxelman B. Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Syst Biol (2003) 52(5):665–673.
Felsenstein J. Inferring Phylogenies (2004) Sunderland (MA): Sinauer Associates.
Hillis DM, Bull JJ. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst Biol (1993) 42(2):182–192.
Huelsenbeck JP, Larget B, Miller RE, Ronquist F. Potential applications and pitfalls of Bayesian inference of phylogeny. Syst Biol. (2002) 51(5):673–688.[CrossRef][Web of Science][Medline]
Huelsenbeck JP, Rannala B. Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Syst Biol (2004) 53(6):904–913.
Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP. Bayesian inference of phylogeny and its impact on evolutionary biology. Science (2001) 294:2310–2314.[CrossRef][Web of Science][Medline]
Karol KG, McCourt RM, Cimino MT, Delwiche CF. The closest living relatives of land plants. Science (2001) 294(5550):2351–2353.[CrossRef][Web of Science][Medline]
Lemmon AR, Moriarty EC. The importance of proper model assumption in Bayesian phylogenetics. Syst Biol (2004) 53(2):265–277.
Lewis PO, Holder MT, Holsinger KE. Polytomies and Bayesian phylogenetic inference. Syst Biol (2005) 54(2):241–253.
Mar JC, Harlow TJ, Ragan MA. Bayesian and maximum likelihood phylogenetic analyses of protein sequence data under relative branch length differences and model violation. BMC Evol Biol (2005) 5(1):8.[CrossRef][Medline]
Misawa K, Nei M. Reanalysis of Murphy et al.'s data gives various mammalian phylogenies and suggests overcredibility of Bayesian trees. J Mol Evol (2003) 57:S290–S296.[CrossRef][Web of Science][Medline]
Murphy WJ, Eizirik E, O'Brien SJ. (11 co-authors). Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science (2001) 294(5550):2348–2351.[CrossRef][Web of Science][Medline]
Rannala B, Yang Z. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J Mol Evol (1996) 43:304–311.[Web of Science][Medline]
Rogers JS. Maximum likelihood estimation of phylogenetic trees is consistent when substitution rates vary according to the invariable sites plus gamma distribution. Syst Biol (2001) 50(5):713–722.
Ronquist F, Huelsenbeck JP. Mrbayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics (2003) 19(12):1572–1574.
Simmons MP, Pickett KM, Miya M. How meaningful are Bayesian support values? Mol Biol Evol (2004) 21(1):188–199.
Sitnikova T, Rzhetsky A, Nei M. Interior-branch and bootstrap tests of phylogenetic trees. Mol Biol Evol (1995) 12(2):319–333.[Abstract]
Suzuki Y, Glazko GV, Nei M. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc Natl Acad Sci USA (2002) 99(25):16138–16143.
Swofford DL. Phylogenetic analysis using parsimony (*and other methods) (2002) Sunderland (MA): Sinauer Associates.
Taylor DJ, Piel WH. An assessment of accuracy, error, and conflict with support values from genome-scale phylogenetic data. Mol Biol Evol (2004) 21(8):1534–1537.
Wilcox TP, Zwickl DJ, Heath TA, Hillis DM. Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support. Mol Phylogenet Evol (2002) 25:361–371.[CrossRef][Web of Science][Medline]
Yang Z, Goldman N, Friday AE. Maximum likelihood trees from DNA sequences: a peculiar statistical estimation problem. Syst Biol (1995) 44(3):384–399.[Abstract]
Yang Z, Rannala B. Branch-length prior influences Bayesian posterior probability of phylogeny. Syst Biol (2005) 54(3):455–470.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
L.-J. Liang, R. E. Weiss, B. Redelings, and M. A. Suchard Improving phylogenetic analyses by incorporating additional information from genetic sequence databases Bioinformatics, October 1, 2009; 25(19): 2530 - 2536. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





