MBE Advance Access published online on May 21, 2004
Molecular Biology and Evolution, doi:10.1093/molbev/msh159
Molecular Biology and Evolution © Society for Molecular Biology and Evolution 2004; all rights reserved
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Genome Atlantic, Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada B3H 3J5
* To whom correspondence should be addressed. E-mail: susko{at}mathstat.dal.ca.
Using analytical methods, we show that under a variety of model misspecifications, neighbour joining, minimum evolution and least squares estimation procedures are statistically inconsistent. Failure to correctly account for differing rates-across-sites processes, failure to correctly model rate matrix parameters and failure to adjust for parallel rates-across-sites changes (a rates-across-subtrees process) are all shown to lead to a "long branch attraction" form of inconsistency. In addition, failure to account for rates-across-sites processes is also shown to result in underestimation of evolutionary distances for a wide variety of substitution models, generalizing an earlier analytical result for the Jukes-Cantor model in Golding (1983). Although standard rates-across-sites models can be employed in many of these cases to restore consistency, current models cannot account for other kinds of misspecification. We examine an idealized, but biologically relevant case, where parallel changes in rates at sites across subtrees is shown to give rise to inconsistency. This changing rates-across-subtrees type model misspecification cannot be adjusted for with conventional methods or without carefully considering the rate variation in the larger tree. While the results are presented for four-taxon trees, the expectation is that they have implications for larger trees as well. To illustrate this, a simulated 42 taxon example is given in which the microsporidia, an enigmatic group of eukaryotes, are incorrectly placed at the archaebacteria/eukaryotes split because of incorrectly specified pairwise distances. The analytical nature of the results lend insight into the reasons that long branch attraction tends to be a common form of inconsistency and reasons that other forms of inconsistency like "long branches repel" can arise in some settings. In many of the cases of inconsistency presented, a particular incorrect topology is estimated with probability converging to one, the implication being that measures of uncertainty like bootstrap support will be unable to detect that there is a problem with the estimation. The focus is on distance methods but previous simulation results suggest that the zones of in consistency for distance methods contain the zones of inconsistency for maximum likelihood methods as well. Key Words:
Inconsistency, Rates across sites, distance methods, neighbour joining, molecular evolution, phylogenetics
Original Articles
On Inconsistency of the Neighbour-Joining, Least Squares and Minimum Evolution Estimation when Substitution Processes Are Incorrectly Modeled
2 Genome Atlantic, Canadian Institute for Advanced Research, Program in Evolutionary Biology, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4H7
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Kim and M. J. Sanderson Penalized Likelihood Phylogenetic Inference: Bridging the Parsimony-Likelihood Gap Syst Biol, October 1, 2008; 57(5): 665 - 674. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Birin, Z. Gal-Or, I. Elias, and T. Tuller Inferring horizontal transfers in the presence of rearrangements by the minimum evolution criterion Bioinformatics, March 15, 2008; 24(6): 826 - 832. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Spencer, D. Bryant, and E. Susko Conditioned Genome Reconstruction: How to Avoid Choosing the Conditioning Genome Syst Biol, February 1, 2007; 56(1): 25 - 43. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Ruano-Rubio and M. A. Fares Artifactual Phylogenies Caused by Correlated Distribution of Substitution Rates among Sites and Lineages: The Good, the Bad, and the Ugly Syst Biol, February 1, 2007; 56(1): 68 - 82. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-C. Wang, M. Spencer, E. Susko, and A. J. Roger Testing for Covarion-like Evolution in Protein Sequences Mol. Biol. Evol., January 1, 2007; 24(1): 294 - 305. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Shalchian-Tabrizi, M. Skanseng, F. Ronquist, D. Klaveness, T. R. Bachvaroff, C. F. Delwiche, A. Botnen, T. Tengs, and K. S. Jakobsen Heterotachy Processes in Rhodophyte-Derived Secondhand Plastid Genes: Implications for Addressing the Origin and Evolution of Dinoflagellate Plastids Mol. Biol. Evol., August 1, 2006; 23(8): 1504 - 1515. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J Roger and L. A Hug The origin and diversification of eukaryotes: problems with molecular phylogenetics and molecular clock estimation Phil Trans R Soc B, June 29, 2006; 361(1470): 1039 - 1054. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Lockhart, P. Novis, B. G. Milligan, J. Riden, A. Rambaut, and T. Larkum Heterotachy and Tree Building: A Case Study with Plastids and Eubacteria Mol. Biol. Evol., January 1, 2006; 23(1): 40 - 45. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Belda, A. Moya, and F. J. Silva Genome Rearrangement Distances and Gene Order Phylogeny in {gamma}-Proteobacteria Mol. Biol. Evol., June 1, 2005; 22(6): 1456 - 1467. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Spencer, E. Susko, and A. J. Roger Likelihood, Parsimony, and Heterogeneous Evolution Mol. Biol. Evol., May 1, 2005; 22(5): 1161 - 1164. [Abstract] [Full Text] [PDF] |
||||



