MBE Advance Access published online on May 2, 2008
Molecular Biology and Evolution, doi:10.1093/molbev/msn104
Published by Oxford University Press 2008.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Research Article |
Confirming the Phylogeny of Mammals by Use of Large Comparative Sequence Datasets
1 Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892
2 NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892
3 Integrated Biosciences Program, George Washington University, Washington, DC 20052
4 Department of Biological Sciences, George Washington University, Washington, DC 20052
5 To whom correspondences should be addressed: egreen{at}nhgri.nih.gov, 50 South Dr. Bldg. 50, Rm. 5222, Bethesda, MD 20892-8002, Phone: (301) 402-2023, Fax: (301) 402-2040
Received for publication October 15, 2007. Revision received February 26, 2008. Revision received March 26, 2008. Accepted for publication April 7, 2008.
The ongoing generation of prodigious amounts of genomic sequence data from myriad vertebrates is providing unparalleled opportunities for establishing definitive phylogenetic relationships among species. The size and complexities of such comparative sequence datasets allow smaller and more-difficult branches to be resolved, but also present unique challenges, including large computational requirements and the negative consequences of systematic biases. To explore these issues and to clarify the phylogenetic relationships among mammals, we have analyzed a large dataset of over 60 megabase pairs (Mb) of high-quality genomic sequence, which we generated from 41 mammals and 3 other vertebrates. All sequences are orthologous to a 1.9-Mb region of the human genome that encompasses the cystic fibrosis transmembrane conductance regulator gene (CFTR). To understand the characteristics and challenges associated with phylogenetic analyses of such a large dataset, we partitioned the sequence data in several ways, and utilized maximum likelihood, maximum parsimony, and neighbor joining algorithms, implemented in parallel on Linux clusters. These studies yielded well-supported phylogenetic trees, largely confirming other recent molecular phylogenetic analyses. Our results provide support for rooting the placental mammal tree between Atlantogenata (Xenarthra and Afrotheria) and Boreoeutheria (Euarchontoglires and Laurasiatheria), illustrate the difficulty in resolving some branches even with large amounts of data (e.g., in the case of Laurasiatheria), and demonstrate the valuable role that very large comparative sequence datasets can play in refining our understanding of the evolutionary relationships of vertebrates.
Key Words: Placentalia Eutheria Mammalia mammalian phylogeny phylogenomics Atlantogenata molecular systematics