MBE Advance Access published online on July 13, 2007
Molecular Biology and Evolution, doi:10.1093/molbev/msm139
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Treeness Triangles: Visualizing the Loss of Phylogenetic Signal
1 Allan Wilson Center for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
2 Institute for Fundamental Sciences, Massey University, Palmerston North, New Zealand
3 Institute for Molecular BioSciences, Massey University, Palmerston North, New Zealand
* Author for correspondence. D.Penny{at}massey.ac.nz
Received for publication March 7, 2007. Revision received May 14, 2007. Accepted for publication June 22, 2007.
It is well-known that molecular data saturates with increasing sequence divergence (thereby losing phylogenetic information) and that in addition the accumulation of misleading information due to chance similarities or to systematic bias may accompany saturation as well. Exploratory data analysis methods that can quantify the extent of signal loss or convergence for a given data set are scarce. Such methods are needed because genomics delivers very long sequence alignments spanning substantial phylogenetic depth, where site saturation may be compounded by systematic biases or other alternative signals. Here we introduce the Treeness Triangle (TT) graph, in which signals detectable by Hadamard (spectral) analysis are summed into three categories - those supporting i) external and ii) internal branches in the optimal tree, in addition to iii) the residuals (potential internal branches not present in the optimal tree). These three values are plotted in a standard ternary coordinate system. The approach is illustrated with simulated and real datasets, the latter from complete chloroplast genomes, where potential problems of paralogy or lateral gene acquisition can be excluded. The Treeness Triangle uncovers the divergence-dependent loss of phylogenetic signal as subsets of chloroplast genomes are investigated that span increasingly deeper evolutionary timescales. The rate of signal loss (or signal retention) varies with the gene and/or the method of analysis.
Key Words: plastid genomes spectral analysis model misspecification exploratory data analysis ternary plot Hadamard conjugation
These authors contributed equally to this work.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
O. Deusch, G. Landan, M. Roettger, N. Gruenheit, K. V. Kowallik, J. F. Allen, W. Martin, and T. Dagan Genes of Cyanobacterial Origin in Plant Nuclear Genomes Point to a Heterocyst-Forming Plastid Ancestor Mol. Biol. Evol., April 1, 2008; 25(4): 748 - 761. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Penny, W. T. White, M. D. Hendy, and M. J. Phillips A Bias in ML Estimates of Branch Lengths in the Presence of Multiple Signals Mol. Biol. Evol., February 1, 2008; 25(2): 239 - 242. [Abstract] [Full Text] [PDF] |
||||
