MBE Advance Access published online on March 26, 2008
Molecular Biology and Evolution, doi:10.1093/molbev/msn067
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
An Improved General Amino-Acid Replacement Matrix
* Corresponding author, Méthodes et Algorithmes pour la Bioinformatique, LIRMM, CNRS - Université Montpellier II, 161 rue Ada, 34392 – Montpellier Cedex 5 – France, Tel. 33 (0) 4 67 41 85 47 – Fax. 33 (0) 4 67 41 85 00, URL: http://www.lirmm.fr/mab, Email: gascuel{at}lirmm.fr
Email: le{at}lirmm.fr
Received for publication October 21, 2007. Revision received February 7, 2008. Accepted for publication March 17, 2008.
Amino-acid replacement matrices are an essential basis of protein phylogenetics. They are used to compute substitution probabilities along phylogeny branches, and thus the likelihood of the data. They are also essential in protein alignment. A number of replacement matrices and methods to estimate these matrices from protein alignments have been proposed since the seminal work of Dayhoff et al. (1972). An important advance was achieved by Whelan and Goldman (2001), who designed an efficient maximum-likelihood estimation approach that accounts for the phylogenies of sequences within each training alignment. We further refine this method by incorporating the variability of evolutionary rates across sites in the matrix estimation, and using a much larger and diverse database than BRKALN, which was used to estimate the WAG matrix. To estimate our new matrix (called LG), we use an adaptation of the XRATE software and 3912 alignments from Pfam, comprising
50,000 sequences and
6.5 million residues overall. To evaluate the LG performance, we use an independent sample consisting of 59 alignments from TreeBase, and randomly divide Pfam alignments into 3,412 training and 500 test alignments. The comparison with WAG and JTT shows a clear likelihood improvement. With TreeBase, we find that: (1) the average AIC gain per site is 0.25 and 0.42, when compared to WAG and JTT, respectively; (2) LG is significantly better than WAG for 38 alignments (among 59), and significantly worse with 2 alignments only; (3) tree topologies inferred with LG, WAG and JTT frequently differ, indicating that using LG impacts the likelihood value but also the output tree. Results with the test alignments from Pfam are analogous. LG and a PHYML implementation can be downloaded from http://atgc.lirmm.fr/LG.
Key Words: amino-acid substitutions replacement matrices JTT WAG maximum-likelihood estimations phylogenetic inference
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
P. Deschamps and D. Moreira Signal Conflicts in the Phylogeny of the Primary Photosynthetic Eukaryotes Mol. Biol. Evol., December 1, 2009; 26(12): 2745 - 2753. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Lartillot, T. Lepage, and S. Blanquart PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating Bioinformatics, September 1, 2009; 25(17): 2286 - 2288. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Fletcher and Z. Yang INDELible: A Flexible Simulator of Biological Sequence Evolution Mol. Biol. Evol., August 1, 2009; 26(8): 1879 - 1888. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Garbino, R. J. van Oort, S. S. Dixit, A. P. Landstrom, M. J. Ackerman, and X. H. T. Wehrens Molecular evolution of the junctophilin gene family Physiol Genomics, May 13, 2009; 37(3): 175 - 186. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Bousquet-Antonelli and J.-M. Deragon A comprehensive analysis of the La-motif protein superfamily RNA, May 1, 2009; 15(5): 750 - 764. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Schelbert, S. Aubry, B. Burla, B. Agne, F. Kessler, K. Krupinska, and S. Hortensteiner Pheophytin Pheophorbide Hydrolase (Pheophytinase) Is Involved in Chlorophyll Breakdown during Leaf Senescence in Arabidopsis PLANT CELL, March 1, 2009; 21(3): 767 - 785. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Q. Le, N. Lartillot, and O. Gascuel Phylogenetic mixture models for proteins Phil Trans R Soc B, December 27, 2008; 363(1512): 3965 - 3976. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-H. Kuo, J. P. Wares, and J. C. Kissinger The Apicomplexan Whole-Genome Phylogeny: An Analysis of Incongruence among Gene Trees Mol. Biol. Evol., December 1, 2008; 25(12): 2689 - 2698. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Si Quang, O. Gascuel, and N. Lartillot Empirical profile mixture models for phylogenetic reconstruction Bioinformatics, October 15, 2008; 24(20): 2317 - 2323. [Abstract] [Full Text] [PDF] |
||||





