MBE Advance Access originally published online on December 5, 2003
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mol. Biol. Evol. 21(3):468-488. 2004
DOI: 10.1093/molbev/msh039
© 2004 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
Phylogenetic Estimation of Context-Dependent Substitution Rates by Maximum Likelihood

* Center for Biomolecular Science and Engineering, University of California, Santa Cruz
Howard Hughes Medical Institute, University of California, Santa Cruz
E-mail: acs{at}soe.ucsc.edu.
Nucleotide substitution in both coding and noncoding regions is context-dependent, in the sense that substitution rates depend on the identity of neighboring bases. Context-dependent substitution has been modeled in the case of two sequences and an unrooted phylogenetic tree, but it has only been accommodated in limited ways with more general phylogenies. In this article, extensions are presented to standard phylogenetic models that allow for better handling of context-dependent substitution, yet still permit exact inference at reasonable computational cost. The new models improve goodness of fit substantially for both coding and noncoding data. Considering context dependence leads to much larger improvements than does using a richer substitution model or allowing for rate variation across sites, under the assumption of site independence. The observed improvements appear to derive from three separate properties of the models: their explicit characterization of context-dependent substitution within N-tuples of adjacent sites, their ability to accommodate overlapping N-tuples, and their rich parameterization of the substitution process. Parameter estimation is accomplished using an expectation maximization algorithm, with a quasi-Newton algorithm for the maximization step; this approach is shown to be preferable to ordinary Newton methods for parameter-rich models. Overlapping tuples are efficiently handled by assuming Markov dependence of the observed bases at each site on those at the N - 1 preceding sites, and the required conditional probabilities are computed with an extension of Felsenstein's algorithm. Estimated substitution rates based on a data set of about 160,000 noncoding sites in mammalian genomes indicate a pronounced CpG effect, but they also suggest a complex overall pattern of context-dependent substitution, comprising a variety of subtle effects. Estimates based on about 3 million sites in coding regions demonstrate that amino acid substitution rates can be learned at the nucleotide level, and suggest that context effects across codon boundaries are significant.
Key Words: neighbor-dependent substitution CpG effect codon model expectation maximization substitution rate matrix
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. P. J. de Koning, W. Gu, and D. D. Pollock Rapid Likelihood Analysis on Large Phylogenies Using Partial Sampling of Substitution Histories Mol. Biol. Evol., February 1, 2010; 27(2): 249 - 265. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. S. Pollard, M. J. Hubisz, K. R. Rosenbloom, and A. Siepel Detection of nonneutral substitution rates on mammalian phylogenies Genome Res., January 1, 2010; 20(1): 110 - 121. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hiard, C. Charlier, W. Coppieters, M. Georges, and D. Baurain Patrocles: a database of polymorphic miRNA-mediated gene regulation in vertebrates Nucleic Acids Res., January 1, 2010; 38(suppl_1): D640 - D651. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Eory, D. L. Halligan, and P. D. Keightley Distributions of Selectively Constrained Sites and Deleterious Mutation Rates in the Hominid and Murid Genomes Mol. Biol. Evol., January 1, 2010; 27(1): 177 - 192. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Suzuki, T. Gojobori, and S. Kumar Methods for Incorporating the Hypermutability of CpG Dinucleotides in Detecting Natural Selection Operating at the Amino Acid Sequence Level Mol. Biol. Evol., October 1, 2009; 26(10): 2275 - 2284. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Fletcher and Z. Yang INDELible: A Flexible Simulator of Biological Sequence Evolution Mol. Biol. Evol., August 1, 2009; 26(8): 1879 - 1888. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. D. Keightley, U. Trivedi, M. Thomson, F. Oliver, S. Kumar, and M. L. Blaxter Analysis of the genome sequences of three Drosophila melanogaster spontaneous mutation accumulation lines Genome Res., July 1, 2009; 19(7): 1195 - 1201. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Anisimova and C. Kosiol Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models Mol. Biol. Evol., February 1, 2009; 26(2): 255 - 271. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. H. Majoros and U. Ohler Complexity reduction in context-dependent DNA substitution models Bioinformatics, January 15, 2009; 25(2): 175 - 182. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Delport, K. Scheffler, and C. Seoighe Models of coding sequence evolution Brief Bioinform, January 1, 2009; 10(1): 97 - 109. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Choi, B. D Redelings, and J. L Thorne Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences Phil Trans R Soc B, December 27, 2008; 363(1512): 3931 - 3939. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. E. Pashos, E. Kague, and S. Fisher Evaluation of cis-regulatory function in zebrafish Briefings in Functional Genomics, November 1, 2008; 7(6): 465 - 473. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Paten, J. Herrero, S. Fitzgerald, K. Beal, P. Flicek, I. Holmes, and E. Birney Genome-wide nucleotide-level mammalian ancestor reconstruction Genome Res., November 1, 2008; 18(11): 1829 - 1843. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Baele, Y. Van de Peer, and S. Vansteelandt A Model-Based Approach to Study Nearest-Neighbor Influences Reveals Complex Substitution Patterns in Non-coding Sequences Syst Biol, October 1, 2008; 57(5): 675 - 692. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. K. Holloway, D. J. Begun, A. Siepel, and K. S. Pollard Accelerated sequence divergence of conserved genomic elements in Drosophila melanogaster Genome Res., October 1, 2008; 18(10): 1592 - 1601. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Peifer, J. E. Karro, and H. H. von Grunberg Is there an acceleration of the CpG transition rate during the mammalian radiation? Bioinformatics, October 1, 2008; 24(19): 2157 - 2164. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. L. Kosakovsky Pond, A. F.Y. Poon, A. J. Leigh Brown, and S. D.W. Frost A Maximum Likelihood Method for Detecting Directional Evolution in Protein Sequences and Its Application to Influenza A Virus Mol. Biol. Evol., September 1, 2008; 25(9): 1809 - 1824. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Barquist and I. Holmes xREI: a phylo-grammar visualization webserver Nucleic Acids Res., July 1, 2008; 36(suppl_2): W65 - W69. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Ke, X. H.-F. Zhang, and L. A. Chasin Positive selection acting on splicing motifs reflects compensatory evolution Genome Res., April 1, 2008; 18(4): 533 - 543. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yang and R. Nielsen Mutation-Selection Models of Codon Substitution and Their Use to Estimate Selective Strengths on Codon Usage Mol. Biol. Evol., March 1, 2008; 25(3): 568 - 579. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Lunter, A. Rocco, N. Mimouni, A. Heger, A. Caldeira, and J. Hein Uncertainty in homology inferences: Assessing and improving genomic sequence alignment Genome Res., February 1, 2008; 18(2): 298 - 309. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. K. Bradley and I. Holmes Transducers: an emerging probabilistic framework for modeling indels on trees Bioinformatics, December 1, 2007; 23(23): 3258 - 3262. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. T. Saunders and P. Green Insights from Modeling Protein Evolution with Context-Dependent Mutation and Asymmetric Amino Acid Selection Mol. Biol. Evol., December 1, 2007; 24(12): 2632 - 2647. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Thorne, S. C. Choi, J. Yu, P. G. Higgs, and H. Kishino Population Genetics Without Intraspecific Data Mol. Biol. Evol., August 1, 2007; 24(8): 1667 - 1677. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Choi, A. Hobolth, D. M. Robinson, H. Kishino, and J. L. Thorne Quantifying the Impact of Protein Tertiary Structure on Molecular Evolution Mol. Biol. Evol., August 1, 2007; 24(8): 1769 - 1782. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Hernandez, S. H. Williamson, and C. D. Bustamante Context Dependence, Ancestral Misidentification, and Spurious Signatures of Natural Selection Mol. Biol. Evol., August 1, 2007; 24(8): 1792 - 1800. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Carmel, Y. I. Wolf, I. B. Rogozin, and E. V. Koonin Three distinct modes of intron dynamics in the evolution of eukaryotes Genome Res., July 1, 2007; 17(7): 1034 - 1044. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. H. Margulies, G. M. Cooper, G. Asimenos, D. J. Thomas, C. N. Dewey, A. Siepel, E. Birney, D. Keefe, A. S. Schwartz, M. Hou, et al. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome Genome Res., June 1, 2007; 17(6): 760 - 774. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Tanay, A. H. O'Donnell, M. Damelin, and T. H. Bestor Hyperconserved CpG domains underlie Polycomb-binding sites PNAS, March 27, 2007; 104(13): 5521 - 5526. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Shapiro, A. Rambaut, O. G. Pybus, and E. C. Holmes A Phylogenetic Method for Detecting Positive Epistasis in Gene Sequences and Its Application to RNA Virus Evolution Mol. Biol. Evol., September 1, 2006; 23(9): 1724 - 1730. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Rodrigue, H. Philippe, and N. Lartillot Assessing Site-Interdependent Phylogenetic Models of Sequence Evolution Mol. Biol. Evol., September 1, 2006; 23(9): 1762 - 1775. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Tanay Extensive low-affinity transcriptional interactions in the yeast genome Genome Res., August 1, 2006; 16(8): 962 - 972. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yu and J. L. Thorne Dependence among Sites in RNA Evolution Mol. Biol. Evol., August 1, 2006; 23(8): 1525 - 1537. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Hobolth, R. Nielsen, Y. Wang, F. Wu, and S. D. Tanksley CpG + CpNpG Analysis of Protein-Coding Sequences from Tomato Mol. Biol. Evol., June 1, 2006; 23(6): 1318 - 1323. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. W. Hahn Accurate Inference and Estimation in Population Genomics Mol. Biol. Evol., May 1, 2006; 23(5): 911 - 918. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Antonini, B. Rossi, R. Han, A. Minichiello, T. Di Palma, M. Corrado, S. Banfi, M. Zannini, J. L. Brissette, and C. Missero An Autoregulatory Loop Directs the Tissue-Specific Expression of p63 through a Long-Range Evolutionarily Conserved Enhancer Mol. Cell. Biol., April 15, 2006; 26(8): 3308 - 3318. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Gesell and A. von Haeseler In silico sequence evolution with site-specific interactions along phylogenetic trees Bioinformatics, March 15, 2006; 22(6): 716 - 722. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Taylor, S. Tyekucheva, M. Zody, F. Chiaromonte, and K. D. Makova Strong and Weak Male Mutation Bias at Different Sites in the Primate Genomes: Insights from the Human-Chimpanzee Comparison Mol. Biol. Evol., March 1, 2006; 23(3): 565 - 573. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Y. Tseng and J. Liang Estimation of Amino Acid Residue Substitution Rates at Local Spatial Regions and Application in Protein Function Inference: A Bayesian Monte Carlo Approach Mol. Biol. Evol., February 1, 2006; 23(2): 421 - 436. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Stern and T. Pupko An Evolutionary Space-Time Model with Varying Among-Site Dependencies Mol. Biol. Evol., February 1, 2006; 23(2): 392 - 400. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Tran, P. Havlak, and J. Miller MicroRNA enrichment among short 'ultraconserved' sequences in insects. Nucleic Acids Res., January 1, 2006; 34(9): e65 - e65. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Sironi, G. Menozzi, G. P. Comi, R. Cagliani, N. Bresolin, and U. Pozzoli Analysis of intronic conserved elements indicates that functional complexity might represent a major source of negative selection on non-coding sequences Hum. Mol. Genet., September 1, 2005; 14(17): 2533 - 2546. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Siepel, G. Bejerano, J. S. Pedersen, A. S. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L. W. Hillier, S. Richards, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes Genome Res., August 1, 2005; 15(8): 1034 - 1050. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. Gaffney and P. D. Keightley The scale of mutational variation in the murid genome Genome Res., August 1, 2005; 15(8): 1086 - 1094. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. M. Cooper, E. A. Stone, G. Asimenos, NISC Comparative Sequencing Program, E. D. Green, S. Batzoglou, and A. Sidow Distribution and intensity of constraint in mammalian genomic sequence Genome Res., July 1, 2005; 15(7): 901 - 913. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Holmes Using evolutionary Expectation Maximization to estimate indel rates Bioinformatics, May 15, 2005; 21(10): 2294 - 2300. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. F. Arndt and T. Hwa Identification and measurement of neighbor-dependent nucleotide substitution processes Bioinformatics, May 15, 2005; 21(10): 2322 - 2328. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. H. Brown, S. S. Gross, and M. R. Brent Begin at the beginning: Predicting genes with 5' UTRs Genome Res., May 1, 2005; 15(5): 742 - 747. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. H. Margulies, J. P. Vinson, NISC Comparative Sequencing Program, W. Miller, D. B. Jaffe, K. Lindblad-Toh, J. L. Chang, E. D. Green, E. S. Lander, J. C. Mullikin, et al. An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing PNAS, March 29, 2005; 102(13): 4795 - 4800. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. H. Margulies, NISC Comparative Sequencing Program, V. V. B. Maduro, P. J. Thomas, J. P. Tomkins, C. T. Amemiya, M. Luo, and E. D. Green Comparative sequencing provides insights about the structure and conservation of marsupial and monotreme genomes PNAS, March 1, 2005; 102(9): 3354 - 3359. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Ovcharenko, G. G. Loots, M. A. Nobrega, R. C. Hardison, W. Miller, and L. Stubbs Evolution and functional classification of vertebrate gene deserts Genome Res., January 1, 2005; 15(1): 137 - 145. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. G. Hwang and P. Green Inaugural Article: Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution PNAS, September 28, 2004; 101(39): 13994 - 14001. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Pedersen, I. M. Meyer, R. Forsberg, P. Simmonds, and J. Hein A comparative method for finding and folding RNA secondary structures within protein-coding regions Nucleic Acids Res., September 24, 2004; 32(16): 4925 - 4936. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Huttley Modeling the Impact of DNA Methylation on the Evolution of BRCA1 in Mammals Mol. Biol. Evol., September 1, 2004; 21(9): 1760 - 1768. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Whelan and N. Goldman Estimating the Frequency of Events That Cause Multiple-Nucleotide Changes Genetics, August 1, 2004; 167(4): 2027 - 2043. [Abstract] [Full Text] [PDF] |
||||











