MBE Advance Access published online on May 22, 2008
Molecular Biology and Evolution, doi:10.1093/molbev/msn119
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Spatial and temporal heterogeneity in nucleotide sequence evolution
University of Manchester, Faculty of Life Sciences, Michael Smith Building, Oxford Road, Manchester, M13 9PT, UK
Correspondence to: Simon Whelan, University of Manchester, Faculty of Life Sciences, Michael Smith Building, Oxford Road, Manchester, M13 9PT, UK, tel: (0)161-3068901, fax: (0)161-2755982, e-mail: simon.whelan{at}manchester.ac.uk
Received for publication January 31, 2008. Revision received April 1, 2008. Revision received May 8, 2008. Accepted for publication May 11, 2008.
Models of nucleotide substitution make many simplifying assumptions about the evolutionary process, including that the same process acts on all sites in an alignment and on all branches on the phylogenetic tree. Many studies have shown that in reality the substitution process is heterogeneous and that this variability can introduce systematic errors into many forms of phylogenetic analyses. I propose a new rigorous approach for describing heterogeneity called a temporal hidden Markov model (THMM), which can distinguish between among site (spatial) heterogeneity and among lineage (temporal) heterogeneity. Several versions of the THMM are applied to 16 sets of aligned sequences to quantitatively assess the different forms of heterogeneity acting within them. The most general THMM provides the best fit in all of the data sets examined, providing strong evidence of pervasive heterogeneity during evolution. Investigating individual forms of heterogeneity provides further insights. In agreement with previous studies, spatial rate heterogeneity (rates across sites: RAS) is inferred to be the single most prevalent form of heterogeneity. Interestingly, RAS appears so dominant that failure to independently include it in the THMM masks other forms of heterogeneity, particularly temporal heterogeneity. Incorporating RAS into the THMM reveals substantial temporal and spatial heterogeneity in nucleotide composition and bias towards transition substitution in all alignments examined, although the relative importance of different forms of heterogeneity varies between data sets. Furthermore, the improvements in model fit observed by adding complexity to the model suggest that the THMMs used in this study do not capture all the evolutionary heterogeneity occurring in the data. These observations all indicate that current tests may consistently underestimate the degree of temporal heterogeneity occurring in data. Finally, there is a weak link between the amount of heterogeneity detected and the level of divergence between the sequences, suggesting that variability in the evolutionary process will be a particular problem for deep phylogeny.
Key Words: phylogenetics maximum likelihood temporal hidden Markov models covarion heterotachy, heterogeneity
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Spencer and A. Sangaralingam A Phylogenetic Mixture Model for Gene Family Loss in Parasitic Bacteria Mol. Biol. Evol., August 1, 2009; 26(8): 1901 - 1908. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Whelan The genetic code can cause systematic bias in simple phylogenetic models Phil Trans R Soc B, December 27, 2008; 363(1512): 4003 - 4011. [Abstract] [Full Text] [PDF] |
||||

