Skip Navigation


MBE Advance Access originally published online on October 6, 2004
Molecular Biology and Evolution 2005 22(3):395-401; doi:10.1093/molbev/msi002
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Correction to PDF
Right arrow An erratum has been published
Right arrow All Versions of this Article:
22/3/395    most recent
msi002v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Waddell, P. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Waddell, P. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Molecular Biology and Evolution vol. 22 no. 3 © Society for Molecular Biology and Evolution 2004; all rights reserved.

Research Article

Measuring the Fit of Sequence Data to Phylogenetic Model: Allowing for Missing Data

Peter J. Waddell

Department of Statistics, Department of Biological Sciences, University of South Carolina, Columbia

E-mail: waddell{at}stat.sc.edu.

It is fundamentally important to assess the fit of data to model in phylogenetic and evolutionary studies. Phylogenetic methods using molecular sequences typically start with a multiple alignment. It is possible to measure the fit of data to model expectations of data, for example, via the likelihood-ratio (G) test or the X2 test, if all sites in all sequences have an unambiguous residue. However, nearly all alignments of interest contain sites (columns of the alignment) with missing data, that is, ambiguous nucleotides, gaps, or unsequenced regions, which must presently be removed before using the above tests. Unfortunately, this is often either undesirable or impractical, as it will discard much of the data. Here, we show how iterative ML estimators may directly estimate the site-pattern probabilities for columns with missing data, given only standard i.i.d. assumptions. The optimization may use an EM or Newton algorithm, or any other hill-climbing approach. The resulting optimal likelihood under the unconstrained or multinomial model may be compared directly with the likelihood of the data coming from the model (a G statistic). Alternatively the modified observed and the expected frequencies of site patterns may be compared using a X2 test. The distribution of such statistics is best assessed using appropriate simulations. The new method is applicable to models using codons or paired sites. The methods are also useful with Hadamard conjugations (spectral analysis) and are illustrated with these and with ML evolutionary models that allow site-rate variability.

Key Words: Phylogenetic likelihood-ratio test • model fit • ML with unequal site rates • G statistic • Hadamard conjugation • spectral analysis


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
F. Cheng, S. Hartmann, M. Gupta, J. G. Ibrahim, and T. J. Vision
A hierarchical model for incomplete alignments in phylogenetic inference
Bioinformatics, March 1, 2009; 25(5): 592 - 598.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
C. Li, G. Lu, and G. Orti
Optimal Data Partitioning and a Test Case for Ray-Finned Fishes (Actinopterygii) Based on Ten Nuclear Loci
Syst Biol, August 1, 2008; 57(4): 519 - 539.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
W. White, S. Hills, R Gaddam, B. Holland, and D. Penny
Treeness Triangles: Visualizing the Loss of Phylogenetic Signal
Mol. Biol. Evol., September 1, 2007; 24(9): 2029 - 2039.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.