MBE Advance Access published online on December 5, 2003
Molecular Biology and Evolution, doi:10.1093/molbev/msh039
Molecular Biology and Evolution © Society for Molecular Biology and Evolution 2003; all rights reserved
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064
* To whom correspondence should be addressed. E-mail: acs{at}soe.ucsc.edu.
Nucleotide substitution in both coding and non-coding regions is context-dependent, in the sense that substitution rates depend on the identity of neighboring bases. Context-dependent substitution has been modeled in the case of two sequences and an unrooted phylogenetic tree, but has only been allowed for in limited ways with more general phylogenies. In this paper, extensions are presented to standard phylogenetic models that allow for improved handling of context-dependent substitution, yet still permit exact inference at reasonable computational cost. The new models improve goodness of fit substantially for both coding and non-coding data. Considering context dependence leads to much larger improvements than does using a richer substitution model or allowing for rate variation across sites, under the assumption of site independence. The observed improvements appear to derive from three separate properties of the models: their explicit characterization of context-dependent substitution within N-tuples of adjacent sites, their ability to accommodate overlapping Ntuples, and their rich parameterization of the substitution process. Parameter estimation is accomplished using an expectation maximization (EM) algorithm, with a quasi-Newton algorithm for the maximization step; this approach is shown to be preferable to ordinary Newton methods for parameter-rich models. Overlapping tuples are efficiently handled by assuming Markov dependence of the observed bases at each site on those at the N - 1 preceding sites, and the required conditional probabilities are computed with an extension of Felsenstein's algorithm. Estimated substitution rates based on a data set of about 160,000 non-coding sites in mammalian genomes indicate a pronounced CpG effect, but also suggest a complex overall pattern of context-dependent substitution, comprising a variety of subtle effects. Estimates based on about 3 million sites in coding regions demonstrate that amino acid substitution rates can be learned at the nucleotide level, and suggest that context effects across codon boundaries are significant. Key Words:
neighbor-dependent substitution, CpG effect, codon model, expectation maximization, substitution rate matrix
© 2003 Society for Molecular Biology and Evolution
Original Articles
Phylogenetic Estimation of Context-Dependent Substitution Rates by Maximum Likelihood
2 Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064; Howard Hughes Medical Institute, University of California, Santa Cruz, CA 95064
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. L. Kosakovsky Pond, A. F.Y. Poon, A. J. Leigh Brown, and S. D.W. Frost A Maximum Likelihood Method for Detecting Directional Evolution in Protein Sequences and Its Application to Influenza A Virus Mol. Biol. Evol., September 1, 2008; 25(9): 1809 - 1824. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Barquist and I. Holmes xREI: a phylo-grammar visualization webserver Nucleic Acids Res., July 1, 2008; 36(suppl_2): W65 - W69. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Ke, X. H.-F. Zhang, and L. A. Chasin Positive selection acting on splicing motifs reflects compensatory evolution Genome Res., April 1, 2008; 18(4): 533 - 543. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yang and R. Nielsen Mutation-Selection Models of Codon Substitution and Their Use to Estimate Selective Strengths on Codon Usage Mol. Biol. Evol., March 1, 2008; 25(3): 568 - 579. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Lunter, A. Rocco, N. Mimouni, A. Heger, A. Caldeira, and J. Hein Uncertainty in homology inferences: Assessing and improving genomic sequence alignment Genome Res., February 1, 2008; 18(2): 298 - 309. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. K. Bradley and I. Holmes Transducers: an emerging probabilistic framework for modeling indels on trees Bioinformatics, December 1, 2007; 23(23): 3258 - 3262. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. T. Saunders and P. Green Insights from Modeling Protein Evolution with Context-Dependent Mutation and Asymmetric Amino Acid Selection Mol. Biol. Evol., December 1, 2007; 24(12): 2632 - 2647. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Thorne, S. C. Choi, J. Yu, P. G. Higgs, and H. Kishino Population Genetics Without Intraspecific Data Mol. Biol. Evol., August 1, 2007; 24(8): 1667 - 1677. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Choi, A. Hobolth, D. M. Robinson, H. Kishino, and J. L. Thorne Quantifying the Impact of Protein Tertiary Structure on Molecular Evolution Mol. Biol. Evol., August 1, 2007; 24(8): 1769 - 1782. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Hernandez, S. H. Williamson, and C. D. Bustamante Context Dependence, Ancestral Misidentification, and Spurious Signatures of Natural Selection Mol. Biol. Evol., August 1, 2007; 24(8): 1792 - 1800. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Carmel, Y. I. Wolf, I. B. Rogozin, and E. V. Koonin Three distinct modes of intron dynamics in the evolution of eukaryotes Genome Res., July 1, 2007; 17(7): 1034 - 1044. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. H. Margulies, G. M. Cooper, G. Asimenos, D. J. Thomas, C. N. Dewey, A. Siepel, E. Birney, D. Keefe, A. S. Schwartz, M. Hou, et al. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome Genome Res., June 1, 2007; 17(6): 760 - 774. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Tanay, A. H. O'Donnell, M. Damelin, and T. H. Bestor Hyperconserved CpG domains underlie Polycomb-binding sites PNAS, March 27, 2007; 104(13): 5521 - 5526. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Shapiro, A. Rambaut, O. G. Pybus, and E. C. Holmes A Phylogenetic Method for Detecting Positive Epistasis in Gene Sequences and Its Application to RNA Virus Evolution Mol. Biol. Evol., September 1, 2006; 23(9): 1724 - 1730. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Rodrigue, H. Philippe, and N. Lartillot Assessing Site-Interdependent Phylogenetic Models of Sequence Evolution Mol. Biol. Evol., September 1, 2006; 23(9): 1762 - 1775. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Tanay Extensive low-affinity transcriptional interactions in the yeast genome Genome Res., August 1, 2006; 16(8): 962 - 972. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yu and J. L. Thorne Dependence among Sites in RNA Evolution Mol. Biol. Evol., August 1, 2006; 23(8): 1525 - 1537. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Hobolth, R. Nielsen, Y. Wang, F. Wu, and S. D. Tanksley CpG + CpNpG Analysis of Protein-Coding Sequences from Tomato Mol. Biol. Evol., June 1, 2006; 23(6): 1318 - 1323. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. W. Hahn Accurate Inference and Estimation in Population Genomics Mol. Biol. Evol., May 1, 2006; 23(5): 911 - 918. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Antonini, B. Rossi, R. Han, A. Minichiello, T. Di Palma, M. Corrado, S. Banfi, M. Zannini, J. L. Brissette, and C. Missero An Autoregulatory Loop Directs the Tissue-Specific Expression of p63 through a Long-Range Evolutionarily Conserved Enhancer Mol. Cell. Biol., April 15, 2006; 26(8): 3308 - 3318. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Gesell and A. von Haeseler In silico sequence evolution with site-specific interactions along phylogenetic trees Bioinformatics, March 15, 2006; 22(6): 716 - 722. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Taylor, S. Tyekucheva, M. Zody, F. Chiaromonte, and K. D. Makova Strong and Weak Male Mutation Bias at Different Sites in the Primate Genomes: Insights from the Human-Chimpanzee Comparison Mol. Biol. Evol., March 1, 2006; 23(3): 565 - 573. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Y. Tseng and J. Liang Estimation of Amino Acid Residue Substitution Rates at Local Spatial Regions and Application in Protein Function Inference: A Bayesian Monte Carlo Approach Mol. Biol. Evol., February 1, 2006; 23(2): 421 - 436. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Stern and T. Pupko An Evolutionary Space-Time Model with Varying Among-Site Dependencies Mol. Biol. Evol., February 1, 2006; 23(2): 392 - 400. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Tran, P. Havlak, and J. Miller MicroRNA enrichment among short 'ultraconserved' sequences in insects. Nucleic Acids Res., January 1, 2006; 34(9): e65 - e65. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Sironi, G. Menozzi, G. P. Comi, R. Cagliani, N. Bresolin, and U. Pozzoli Analysis of intronic conserved elements indicates that functional complexity might represent a major source of negative selection on non-coding sequences Hum. Mol. Genet., September 1, 2005; 14(17): 2533 - 2546. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Siepel, G. Bejerano, J. S. Pedersen, A. S. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L. W. Hillier, S. Richards, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes Genome Res., August 1, 2005; 15(8): 1034 - 1050. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. Gaffney and P. D. Keightley The scale of mutational variation in the murid genome Genome Res., August 1, 2005; 15(8): 1086 - 1094. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. M. Cooper, E. A. Stone, G. Asimenos, NISC Comparative Sequencing Program, E. D. Green, S. Batzoglou, and A. Sidow Distribution and intensity of constraint in mammalian genomic sequence Genome Res., July 1, 2005; 15(7): 901 - 913. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Holmes Using evolutionary Expectation Maximization to estimate indel rates Bioinformatics, May 15, 2005; 21(10): 2294 - 2300. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. F. Arndt and T. Hwa Identification and measurement of neighbor-dependent nucleotide substitution processes Bioinformatics, May 15, 2005; 21(10): 2322 - 2328. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. H. Brown, S. S. Gross, and M. R. Brent Begin at the beginning: Predicting genes with 5' UTRs Genome Res., May 1, 2005; 15(5): 742 - 747. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. H. Margulies, J. P. Vinson, NISC Comparative Sequencing Program, W. Miller, D. B. Jaffe, K. Lindblad-Toh, J. L. Chang, E. D. Green, E. S. Lander, J. C. Mullikin, et al. An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing PNAS, March 29, 2005; 102(13): 4795 - 4800. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. H. Margulies, NISC Comparative Sequencing Program, V. V. B. Maduro, P. J. Thomas, J. P. Tomkins, C. T. Amemiya, M. Luo, and E. D. Green Comparative sequencing provides insights about the structure and conservation of marsupial and monotreme genomes PNAS, March 1, 2005; 102(9): 3354 - 3359. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Ovcharenko, G. G. Loots, M. A. Nobrega, R. C. Hardison, W. Miller, and L. Stubbs Evolution and functional classification of vertebrate gene deserts Genome Res., January 1, 2005; 15(1): 137 - 145. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. G. Hwang and P. Green Inaugural Article: Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution PNAS, September 28, 2004; 101(39): 13994 - 14001. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Pedersen, I. M. Meyer, R. Forsberg, P. Simmonds, and J. Hein A comparative method for finding and folding RNA secondary structures within protein-coding regions Nucleic Acids Res., September 24, 2004; 32(16): 4925 - 4936. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Huttley Modeling the Impact of DNA Methylation on the Evolution of BRCA1 in Mammals Mol. Biol. Evol., September 1, 2004; 21(9): 1760 - 1768. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Whelan and N. Goldman Estimating the Frequency of Events That Cause Multiple-Nucleotide Changes Genetics, August 1, 2004; 167(4): 2027 - 2043. [Abstract] [Full Text] [PDF] |
||||







