Skip Navigation



MBE Advance Access published online on December 5, 2003

Molecular Biology and Evolution, doi:10.1093/molbev/msh039
Molecular Biology and Evolution © Society for Molecular Biology and Evolution 2003; all rights reserved
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
21/3/468    most recent
msh039v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Siepel, A.
Right arrow Articles by Haussler, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Siepel, A.
Right arrow Articles by Haussler, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Accepted October 10, 2003
© 2003 Society for Molecular Biology and Evolution

Original Articles

Phylogenetic Estimation of Context-Dependent Substitution Rates by Maximum Likelihood

Adam Siepel 1* and David Haussler 2

1 Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064
2 Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064; Howard Hughes Medical Institute, University of California, Santa Cruz, CA 95064

* To whom correspondence should be addressed. E-mail: acs{at}soe.ucsc.edu.


   Abstract

Nucleotide substitution in both coding and non-coding regions is context-dependent, in the sense that substitution rates depend on the identity of neighboring bases. Context-dependent substitution has been modeled in the case of two sequences and an unrooted phylogenetic tree, but has only been allowed for in limited ways with more general phylogenies. In this paper, extensions are presented to standard phylogenetic models that allow for improved handling of context-dependent substitution, yet still permit exact inference at reasonable computational cost. The new models improve goodness of fit substantially for both coding and non-coding data. Considering context dependence leads to much larger improvements than does using a richer substitution model or allowing for rate variation across sites, under the assumption of site independence. The observed improvements appear to derive from three separate properties of the models: their explicit characterization of context-dependent substitution within N-tuples of adjacent sites, their ability to accommodate overlapping Ntuples, and their rich parameterization of the substitution process. Parameter estimation is accomplished using an expectation maximization (EM) algorithm, with a quasi-Newton algorithm for the maximization step; this approach is shown to be preferable to ordinary Newton methods for parameter-rich models. Overlapping tuples are efficiently handled by assuming Markov dependence of the observed bases at each site on those at the N - 1 preceding sites, and the required conditional probabilities are computed with an extension of Felsenstein's algorithm. Estimated substitution rates based on a data set of about 160,000 non-coding sites in mammalian genomes indicate a pronounced CpG effect, but also suggest a complex overall pattern of context-dependent substitution, comprising a variety of subtle effects. Estimates based on about 3 million sites in coding regions demonstrate that amino acid substitution rates can be learned at the nucleotide level, and suggest that context effects across codon boundaries are significant.

Key Words: neighbor-dependent substitution, CpG effect, codon model, expectation maximization, substitution rate matrix


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
S. L. Kosakovsky Pond, A. F.Y. Poon, A. J. Leigh Brown, and S. D.W. Frost
A Maximum Likelihood Method for Detecting Directional Evolution in Protein Sequences and Its Application to Influenza A Virus
Mol. Biol. Evol., September 1, 2008; 25(9): 1809 - 1824.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Barquist and I. Holmes
xREI: a phylo-grammar visualization webserver
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W65 - W69.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
S. Ke, X. H.-F. Zhang, and L. A. Chasin
Positive selection acting on splicing motifs reflects compensatory evolution
Genome Res., April 1, 2008; 18(4): 533 - 543.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
Z. Yang and R. Nielsen
Mutation-Selection Models of Codon Substitution and Their Use to Estimate Selective Strengths on Codon Usage
Mol. Biol. Evol., March 1, 2008; 25(3): 568 - 579.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
G. Lunter, A. Rocco, N. Mimouni, A. Heger, A. Caldeira, and J. Hein
Uncertainty in homology inferences: Assessing and improving genomic sequence alignment
Genome Res., February 1, 2008; 18(2): 298 - 309.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. K. Bradley and I. Holmes
Transducers: an emerging probabilistic framework for modeling indels on trees
Bioinformatics, December 1, 2007; 23(23): 3258 - 3262.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. T. Saunders and P. Green
Insights from Modeling Protein Evolution with Context-Dependent Mutation and Asymmetric Amino Acid Selection
Mol. Biol. Evol., December 1, 2007; 24(12): 2632 - 2647.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. L. Thorne, S. C. Choi, J. Yu, P. G. Higgs, and H. Kishino
Population Genetics Without Intraspecific Data
Mol. Biol. Evol., August 1, 2007; 24(8): 1667 - 1677.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. C. Choi, A. Hobolth, D. M. Robinson, H. Kishino, and J. L. Thorne
Quantifying the Impact of Protein Tertiary Structure on Molecular Evolution
Mol. Biol. Evol., August 1, 2007; 24(8): 1769 - 1782.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
R. D. Hernandez, S. H. Williamson, and C. D. Bustamante
Context Dependence, Ancestral Misidentification, and Spurious Signatures of Natural Selection
Mol. Biol. Evol., August 1, 2007; 24(8): 1792 - 1800.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
L. Carmel, Y. I. Wolf, I. B. Rogozin, and E. V. Koonin
Three distinct modes of intron dynamics in the evolution of eukaryotes
Genome Res., July 1, 2007; 17(7): 1034 - 1044.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
E. H. Margulies, G. M. Cooper, G. Asimenos, D. J. Thomas, C. N. Dewey, A. Siepel, E. Birney, D. Keefe, A. S. Schwartz, M. Hou, et al.
Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome
Genome Res., June 1, 2007; 17(6): 760 - 774.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
A. Tanay, A. H. O'Donnell, M. Damelin, and T. H. Bestor
Hyperconserved CpG domains underlie Polycomb-binding sites
PNAS, March 27, 2007; 104(13): 5521 - 5526.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
B. Shapiro, A. Rambaut, O. G. Pybus, and E. C. Holmes
A Phylogenetic Method for Detecting Positive Epistasis in Gene Sequences and Its Application to RNA Virus Evolution
Mol. Biol. Evol., September 1, 2006; 23(9): 1724 - 1730.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
N. Rodrigue, H. Philippe, and N. Lartillot
Assessing Site-Interdependent Phylogenetic Models of Sequence Evolution
Mol. Biol. Evol., September 1, 2006; 23(9): 1762 - 1775.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
A. Tanay
Extensive low-affinity transcriptional interactions in the yeast genome
Genome Res., August 1, 2006; 16(8): 962 - 972.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. Yu and J. L. Thorne
Dependence among Sites in RNA Evolution
Mol. Biol. Evol., August 1, 2006; 23(8): 1525 - 1537.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. Hobolth, R. Nielsen, Y. Wang, F. Wu, and S. D. Tanksley
CpG + CpNpG Analysis of Protein-Coding Sequences from Tomato
Mol. Biol. Evol., June 1, 2006; 23(6): 1318 - 1323.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. W. Hahn
Accurate Inference and Estimation in Population Genomics
Mol. Biol. Evol., May 1, 2006; 23(5): 911 - 918.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. Biol.Home page
D. Antonini, B. Rossi, R. Han, A. Minichiello, T. Di Palma, M. Corrado, S. Banfi, M. Zannini, J. L. Brissette, and C. Missero
An Autoregulatory Loop Directs the Tissue-Specific Expression of p63 through a Long-Range Evolutionarily Conserved Enhancer
Mol. Cell. Biol., April 15, 2006; 26(8): 3308 - 3318.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Gesell and A. von Haeseler
In silico sequence evolution with site-specific interactions along phylogenetic trees
Bioinformatics, March 15, 2006; 22(6): 716 - 722.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. Taylor, S. Tyekucheva, M. Zody, F. Chiaromonte, and K. D. Makova
Strong and Weak Male Mutation Bias at Different Sites in the Primate Genomes: Insights from the Human-Chimpanzee Comparison
Mol. Biol. Evol., March 1, 2006; 23(3): 565 - 573.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
Y. Y. Tseng and J. Liang
Estimation of Amino Acid Residue Substitution Rates at Local Spatial Regions and Application in Protein Function Inference: A Bayesian Monte Carlo Approach
Mol. Biol. Evol., February 1, 2006; 23(2): 421 - 436.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. Stern and T. Pupko
An Evolutionary Space-Time Model with Varying Among-Site Dependencies
Mol. Biol. Evol., February 1, 2006; 23(2): 392 - 400.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Tran, P. Havlak, and J. Miller
MicroRNA enrichment among short 'ultraconserved' sequences in insects.
Nucleic Acids Res., January 1, 2006; 34(9): e65 - e65.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
M. Sironi, G. Menozzi, G. P. Comi, R. Cagliani, N. Bresolin, and U. Pozzoli
Analysis of intronic conserved elements indicates that functional complexity might represent a major source of negative selection on non-coding sequences
Hum. Mol. Genet., September 1, 2005; 14(17): 2533 - 2546.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
A. Siepel, G. Bejerano, J. S. Pedersen, A. S. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L. W. Hillier, S. Richards, et al.
Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
Genome Res., August 1, 2005; 15(8): 1034 - 1050.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
D. J. Gaffney and P. D. Keightley
The scale of mutational variation in the murid genome
Genome Res., August 1, 2005; 15(8): 1086 - 1094.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
G. M. Cooper, E. A. Stone, G. Asimenos, NISC Comparative Sequencing Program, E. D. Green, S. Batzoglou, and A. Sidow
Distribution and intensity of constraint in mammalian genomic sequence
Genome Res., July 1, 2005; 15(7): 901 - 913.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. Holmes
Using evolutionary Expectation Maximization to estimate indel rates
Bioinformatics, May 15, 2005; 21(10): 2294 - 2300.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. F. Arndt and T. Hwa
Identification and measurement of neighbor-dependent nucleotide substitution processes
Bioinformatics, May 15, 2005; 21(10): 2322 - 2328.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
R. H. Brown, S. S. Gross, and M. R. Brent
Begin at the beginning: Predicting genes with 5' UTRs
Genome Res., May 1, 2005; 15(5): 742 - 747.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
E. H. Margulies, J. P. Vinson, NISC Comparative Sequencing Program, W. Miller, D. B. Jaffe, K. Lindblad-Toh, J. L. Chang, E. D. Green, E. S. Lander, J. C. Mullikin, et al.
An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing
PNAS, March 29, 2005; 102(13): 4795 - 4800.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
E. H. Margulies, NISC Comparative Sequencing Program, V. V. B. Maduro, P. J. Thomas, J. P. Tomkins, C. T. Amemiya, M. Luo, and E. D. Green
Comparative sequencing provides insights about the structure and conservation of marsupial and monotreme genomes
PNAS, March 1, 2005; 102(9): 3354 - 3359.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
I. Ovcharenko, G. G. Loots, M. A. Nobrega, R. C. Hardison, W. Miller, and L. Stubbs
Evolution and functional classification of vertebrate gene deserts
Genome Res., January 1, 2005; 15(1): 137 - 145.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. G. Hwang and P. Green
Inaugural Article: Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution
PNAS, September 28, 2004; 101(39): 13994 - 14001.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. S. Pedersen, I. M. Meyer, R. Forsberg, P. Simmonds, and J. Hein
A comparative method for finding and folding RNA secondary structures within protein-coding regions
Nucleic Acids Res., September 24, 2004; 32(16): 4925 - 4936.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
G. A. Huttley
Modeling the Impact of DNA Methylation on the Evolution of BRCA1 in Mammals
Mol. Biol. Evol., September 1, 2004; 21(9): 1760 - 1768.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Whelan and N. Goldman
Estimating the Frequency of Events That Cause Multiple-Nucleotide Changes
Genetics, August 1, 2004; 167(4): 2027 - 2043.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.