MBE Advance Access published online on November 28, 2007
Molecular Biology and Evolution, doi:10.1093/molbev/msm263
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Letter |
A Bias in ML Estimates of Branch Lengths in the Presence of Multiple Signals
1 Allan Wilson Center for Molecular Ecology and Evolution, Massey University, PO Box 11222, Palmerston North, New Zealand
2 Current Address: Australian National University, Canberra, ACT, Australia
Received for publication August 27, 2007. Revision received November 18, 2007. Accepted for publication November 19, 2007.
Sequence data often has competing signals that are detected by network programs or Lento plots. Such data can be formed by generating sequences on more than one tree, and combining the results, a mixture model. We report that with such mixture models, the estimates of edge (branch) lengths from maximum likelihood (ML) methods that assume a single tree are biased. Based on the observed number of competing signals in real data, such a bias of ML is expected to occur frequently. Because network methods can recover competing signals more accurately there is a need for ML methods allowing a network. A fundamental problem is that mixture models can have more parameters than can be recovered from the data, so that some mixtures are not, in principle, identifiable. We recommend that network programs be incorporated into best practice analysis, along with ML and Bayesian trees.