MBE Advance Access originally published online on November 9, 2005
Molecular Biology and Evolution 2006 23(5):848-855; doi:10.1093/molbev/msj061
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005 |
Improved Consensus Network Techniques for Genome-Scale Phylogeny


* Allan Wilson Centre, Institute of Fundamental Sciences, Massey University, New Zealand;
School of Biological Sciences and Sydney University Biological Informatics & Technology Centre, University of Sydney, Sydney, Australia; Unité de Biologie Moleculaire de Gène chez les Extrêmophiles, Institut Pasteur, Paris, France; and
School of Computing Sciences, University of East Anglia, Norwich, United Kingdom
E-mail: b.r.holland{at}massey.ac.nz.
| Abstract |
|---|
|
|
|---|
Although recent studies indicate that estimating phylogenies from alignments of concatenated genes greatly reduces the stochastic error, the potential for systematic error still remains, heightening the need for reliable methods to analyze multigene data sets. Consensus methods provide an alternative, more inclusive, approach for analyzing collections of trees arising from multiple genes. We extend a previously described consensus network method for genome-scale phylogeny (Holland, B. R., K. T. Huber, V. Moulton, and P. J. Lockhart. 2004. Using consensus networks to visualize contradictory evidence for species phylogeny. Mol. Biol. Evol. 21:14591461) to incorporate additional information. This additional information could come from bootstrap analysis, Bayesian analysis, or various methods to find confidence sets of trees. The new methods can be extended to include edge weights representing genetic distance. We use three data sets to illustrate the approach: 61 genes from 14 angiosperm taxa and one gymnosperm, 106 genes from eight yeast taxa, and 46 members of a gene family from 15 vertebrate taxa.
Key Words: consensus networks genome-scale phylogeny gene trees
| Introduction |
|---|
|
|
|---|
Recent phylogenetic studies have used massive amounts of sequence data to infer the evolutionary history of genomes (Rokas et al. 2003
To model sequence evolution for a concatenation of genes, it is possible to allow a different substitution model for each gene or for each codon position within each gene (see Pagel and Meade 2004
; Poladian and Jermiin 2005
). However, this requires estimating a very large number of parameters. An alternative approach is to infer trees for each gene separately and then to combine these trees into a consensus tree (see e.g., Rokas et al. 2003
). Although conceptually simple, the consensus tree approach, by its very nature, omits information as it cannot explicitly display incongruence between different gene trees.
In order to retain more of the information available from the gene trees, Holland et al. (2004)
proposed consensus networks and demonstrated their use by reanalyzing the data set presented in Rokas et al. (2003)
. Consensus networks were obtained by combining the optimal tree for each gene, under either the maximum parsimony or maximum likelihood criterion, and assigning them equal weight. However, this method does not include any additional information on the confidence in phylogenetic trees inferred from the genes. For example, stochastic error is more of a problem for short genes than for long genes and genes may be more or less conserved.
Consensus networks are described in Holland and Moulton (2003)
and Holland, Delsuc and Moulton (2005)
and build upon an idea originally presented in Bandelt (1995)
. They generalize strict and majority-rule consensus trees by allowing the representation of conflicting information that cannot be displayed in a single tree. In particular, given a collection of trees on the same set of taxa, a consensus network displays all those bipartitions, or "splits," that correspond to the edges that are present in more than a certain proportion, x, of the input trees. For example, if x = 1.0, then the consensus network is a strict consensus tree because only those splits that correspond to edges present in every tree will be displayed; if x = 0.5, then the consensus network is the majority-rule consensus tree; and if x < 0.5, then the consensus network can display conflicting splits. In its simplest setting, where edge weights and tree weights are not considered, each split displayed by the consensus network is given support equal to the frequency of its occurrence in the input trees; this can be reflected by the length of the edges that represent the split in the consensus network.
The standard consensus network method is appropriate when the input trees contribute equally. Such cases include the single gene setting with equally parsimonious or equally likely trees, sets of bootstrap trees (Felsenstein 1985
), or, in a Bayesian setting (Huelsenbeck et al. 2002
), trees from Monte Carlo Markov chains. However, we also wish to combine trees that should not contribute equally, so we require additional techniques. In this paper, we present new methods to incorporate additional information into consensus networks and demonstrate their use by applying them to three previously published data sets.
| Methods |
|---|
|
|
|---|
In the multigene setting, phylogenetic analysis of each gene often leads to a collection of plausible trees. When such collections arise from a bootstrap analysis (or from a Markov chain Monte Carlo simulation), a straightforward way to retain the information on what confidence we have in the gene trees is to generate a consensus network for the amalgamation of all the bootstrap trees from each gene. In this way, we identify the strongly supported splits for each gene and then display these splits within a consensus network.
In other cases, the plausible trees for each gene may have associated weights. For example, expected likelihood weights (Strimmer and Rambaut 2002
), or the P values derived using the Shimodaira and Hasegawa (SH)-test (Shimodaira and Hasegawa 1999
), can be used to weight the trees before they are combined into the consensus network. In order to incorporate tree-specific weights into consensus networks, higher weights are assigned to splits corresponding to edges in trees with higher weights than to splits in trees with lower weights.
Suppose we have a collection of trees indexed by some set I, for which tree Ti has weight Wi. For split j, let Ij be the subset of I consisting of the indices of the trees containing split j, and define the support of split j as
We then display those splits in the consensus network for which sj is greater than the threshold, x. Note that in the case where all tree weights are equal, this is equivalent to ignoring the tree weights and applying the standard consensus network approach. Moreover, if the tree weights are positive integers and we replace the original collection of trees by a new one consisting of Wi copies of the tree with index i, then this method is again equivalent to applying the standard consensus network. This new weighted consensus network method can also be viewed as a generalization of the weighted consensus tree method described in Jermiin et al. (1997)
.
We now describe the case where the input trees have edge lengths as well as tree-specific weights. Define lij to be the edge length of the edge displaying split j in tree i. Taking a given threshold, x, we display split j in the consensus network if sj > x and give the edge in the consensus network displaying this split the length wj, where
In other words, the edges representing some split in the consensus network are given length equal to the weighted average of the lengths of the edges representing that split in each of the input trees. Analogous schemes could be used to calculate median or minimum weights.
The methods described above have been implemented as Python scripts that create Nexus files, which can be displayed as networks by Spectronet (Huber et al. 2002
). The scripts are available from the corresponding author on request and can also be downloaded from http://www.usyd.edu.au/subit/.
| Results and Discussion |
|---|
|
|
|---|
We illustrate the use of incorporating weights into consensus networks through three examples. The first example uses the multigene data set that was analyzed by Goremykin et al. (2005)
Data of Goremykin et al. (2005)![]()
The data consist of 61 alignments of protein-coding genes in 15 taxa (14 angiosperm species, with "Pinus" as an outgroup). Phylogenetic analysis was carried out using PAUP* version 4.0b10 (Swofford 2002
). For each gene we used Modeltest version 3.06 (Posada and Crandall 1998
) to estimate the symmetric model of nucleotide substitution that most appropriately fits the gene (as selected by the hierarchical likelihood ratio test option). To reduce computation time, we estimated the parameters of the model on a neighbor-joining tree inferred from p-distances rather than simultaneously estimating the parameters and the most likely tree for each gene. These model parameters were then fixed when estimating the most likely tree using heuristic search (the default heuristic search settings were retained).
The consensus network of the 61 most likely gene trees is shown in figure 1A. We chose a threshold value of x = 0.1 as experimentation showed that using this threshold displayed the most interesting conflicts, but maintained a split system that could be displayed in three dimensions. (Note that this is much better than the worse case scenario with x = 0.1, which could contain nine-dimensional hypercubes.) To identify the strongly supported splits for each gene, we generated 100 bootstrap trees using maximum likelihood for each gene and then concatenated these results into a single collection of trees. The consensus network with x = 0.1 is shown in figure 1B. The network in figure 1C contains the same splits as figure 1B but with edge lengths proportional to a weighted average of the genetics distances.
|
There are three areas of uncertainty within figure 1AC. The most important of these concerns the position of the root, where there are four groupings that appear above the x = 0.1 threshold: Amborella basal; Nymphaea basal; Amborella plus Nymphaea basal; and grasses (Triticum, Oryza, Zea, Saccharum) basal. The support for the Amborella plus Nymphaea basal hypothesis is reduced when considering the strongly supported splits from each gene via the maximum likelihood bootstrap (fig. 1B) rather than a single maximum likelihood tree for each gene (fig. 1A). Also, support for two splits drops below the threshold when the extra information about split support is considered, meaning that figure 1B has less conflict than figure 1A. By considering 6,100 maximum likelihood bootstrap trees, rather than just 61 maximum likelihood trees, we expect to reduce the variance of individual split support values.
The results of Goremykin et al. (2003
, 2004)
, suggesting that grasses may be the most basal angiosperms, were criticized by Stefanovic, Rice and Palmer (2004)
on the grounds that the phylogenetic methodology was inadequate and that the taxon sampling was unbalanced. They suggested that parsimony and maximum likelihood without correction for variable rates across sites were being mislead by long-branch attraction. The analysis here mitigates against the affect of long-branch attraction by using the maximum likelihood optimality criterion with an appropriate model fit for each gene. However, the unbalanced taxon sampling remains an issue. Overall, our results suggest that none of the four possibilities listed above for the rooting of the angiosperms can be conclusively ruled out.
This example illustrates the different advantages of the two edge weighting schemes (either proportional to genetic distance [fig. 1C] or to support [fig. 1A and B]). The potential for long-branch attraction between Pinus and the grasses can be seen in figure 1C but is not apparent in figure 1A and B. However, the relative support for different rootings of the angiosperms is shown most clearly in figure 1B.
Data of Rokas et al. (2003)![]()
The data consist of 106 gene alignments for eight taxa (seven species from the Saccharomyces genus, with Candida albicans as an outgroup). Phylogenetic analysis of the genes was carried out using PAUP* (Swofford 2002
) and Modeltest (Posada and Crandall 1998
) as described for the previous example.
We used two different methods to identify strongly supported trees for each gene: expected likelihood weights (ELWs) (Strimmer and Rambaut 2002
) and the multiple-comparison test of SH-test (Shimodaira and Hasegawa 1999
). These two approaches were chosen to provide an illustration of the method; in general, any method that produces trees with weights could be used, different weighting schemes could be appropriate depending on the particular interests of the user. Both ELWs and the SH-test require maximum-likelihood scores to be evaluated for many bootstrap replicates for all the trees under consideration. To ease the computational burden, we considered the 45 trees agreeing with the constraint tree ((Saccharomyces cerevisiae, Saccharomyces paradoxus), (C. albicans, Saccharomyces castellii, Saccharomyces kluyveri), Saccharomyces mikatae, Saccharomyces kudriavzevii, Saccharomyces bayanus) and used the resampling of estimated log-likelihoods (RELL) bootstrap procedure (Kishino, Miyata, and Hasegawa 1990
). To aid comparison of the two weighting schemes, we used the same RELL bootstrap procedure to calculate the ELW as well as the P values for the SH-test. Consensus networks, based on the ELW and the P value for the SH-test, are shown in figure 2A and B, respectively. The network in figure 2C contains the same splits as in figure 2A and B but with edge lengths proportional to a weighted average of the genetics distances. In all networks x = 0.2; again this value was chosen after some experimentation in order to show the major conflicting splits without allowing the network to become too high dimensional.
|
The networks obtained from using ELWs (fig. 2A) and SH-test P values as weights (fig. 2B) both contain the same splits, but they give different support to these splits. Using ELWs, the split {S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii}|{S. bayanus, C. albicans, S. castellii, S. kluyveri} has higher weight than the conflicting split {S. cerevisiae, S. paradoxus, S. mikatae, C. albicans, S. castellii, S. kluyveri}|{S. bayanus, S. kudriavzevii}, but using SH-test, P values as weights gives higher weight to the latter split. Figure 2A also has the same splits, and similar split support, to figure 1B from Holland et al. (2004)
= 0.05. We conclude that the similarity is due to the fact that such a large number of genes (106) are considered. For data sets with fewer genes, we expect that the differences between consensus networks that do or do not consider tree weights could be greater.
There are two contradictory signals regarding the placement of S. kudriavzevii. Phillips, Delsuc, and Penny (2004)
provided a possible explanation of this: they showed that a signal linking S. kudriavzevii and S. bayanus may be due to compositional heterogeneity in the data and that the other signal is probably historical. The other conflict in the networks (fig. 2) arises from the three different possibilities for grouping the taxa that are associated with edges, that is, C. albicans, S. kluyveri, and S. castellii. The concatenated data set resolves this in favor of the split grouping C. albicans and S. kluyveri. Figure 2C, which incorporates edge lengths proportional to genetic distance, suggests that support for the other two splits may result from attraction of long edges.
Data of Hardy et al. (2004)![]()
This data set is based on an amino acid alignment of 46 type I interferons from human, mouse, chicken, sheep, goat, musk ox, giraffe, cow, pig, horse, rabbit, duck, and zebrafish, Takifugu rubripes and Tetraodon nigroviridis. An initial collection of 10,000 trees was inferred using Tree-Puzzle (Schmidt et al. 2002
) under the Jones-Taylor-Thornton model of amino acid substitutions (Jones, Taylor, and Thornton 1992
), with rates-of-change across sites modeled by a discrete
distribution with four rate categories and a proportion of invariant sites (all parameters were estimated on the neighbor-joining tree). The initial collection of 10,000 trees were compared using the test by Kishino and Hasegawa (1989)
, and 9,499 trees were found to differ significantly from the most likely tree. This gave a reduced set of 501 plausible trees that were assigned tree-specific weights using the likelihood-weighted tree-averaging method of Jermiin et al. (1997)
; in the present case, we used exponential weighting of the differences between the log-likelihood score of the most likely tree and those of the other plausible trees, standardized by the standard errors of those differences (and with a significance level of 5%) (for details and justification, see Jermiin et al. 1997
).
The 501 plausible trees and their tree-specific weights were used to infer a weighted consensus tree (fig. 3). The edge lengths were obtained through maximum-likelihood analysis of the data, given the consensus tree and the model of substitution. The tree shows that the evolutionary relationship of these sequences is well resolved in many areas of the tree and poorly resolved in other areas: at the origin of the ß-,
-, and
-subfamilies; at the origin of the
- and
-subfamilies; and within the human and mouse
-subfamily. In some instances, short edges are highly supported, and in other cases long edges are poorly supported. This latter case suggests that there may be other conflicting signals in the data that cannot be displayed within a consensus tree.
|
The 501 plausible trees and their tree-specific weights were also used to infer a consensus network (fig. 4A) using x = 0.2. The network clearly identifies areas where there is support for conflicting hypotheses. However, the network does not use any edge length information from the input trees, so the network is difficult to interpret within an evolutionary context.
|
The collection of plausible trees, with their edge lengths and tree-specific weights, was used to estimate an alternative consensus network (fig. 4B), again using x = 0.2. This network allows for assessment of conflicting evidence in the data as well as for interpretation of the network within an evolutionary context. Of the four poorly resolved areas, one stands out strongly (i.e., the one at the joining of the ß-,
-, and
-subfamilies) and another one is barely visible (i.e., the one at the joining of the
- and
-subfamilies). The other two areas of conflict are negligible and most likely due to a high degree of sequence similarity. Due to the presence of edge lengths that reflect the amount of genetic change along different lineages, the interpretation of the consensus network is relatively easy and in this case largely consistent with that of Hardy et al. (2004)| Conclusions |
|---|
|
|
|---|
If stochastic error was the only challenge to accurate phylogenetic estimation, then there would be no question that concatenating genes is the best strategy. However, analyses of genome-scale data sets (Phillips, Delsuc, and Penny 2004
Consensus networks provide a more inclusive approach than phylogenetic analysis of concatenated gene sequences because they allow weak or conflicting signals in the genes to be shown. However, we believe there are more appropriate ways to combine available information from different genes than simply to take the best tree for each gene. Here we have presented methods that incorporate more of the information that can be obtained from different genes.
In the approach involving identification of the strongly supported splits at each locus, for example, via a bootstrap analysis, all genes contribute the same number of source trees, but short genes with weak signals will contribute many different trees, whereas longer genes with stronger signals will contribute many copies of the same tree. In contrast, concatenations of genes produce results where the length of each gene acts as a weight on phylogenetic signal of each gene, so long genes with strong signals will tend to dominate the signals of other genes. While this might be desirable from some points of view, it does not allow for assessment of potential conflicts between different subsets of phylogenetic data. While this paper has concentrated on conflicts in the phylogenetic signal, in some data sets conflict might be due to historical signal, for example, in the case of either lateral gene transfer or hybridization.
How much different genes contribute to the overall picture, when combining the sets of plausible trees from different genes, depends on the tree-specific weights used. The advantage of using ELW or Bayesian posterior probabilities is that they sum to a constant amount for each gene, that is, the sum of the tree-specific weights for each gene is 1 for ELW and 0.95 for Bayesian posterior probabilities (given
= 0.05). By contrast, P values, such as those determined by the Kishino and Hasegawa (KH)-tests or SH-tests, do not. In the case of an uninformative gene, many trees would have high P values and thus bias the consensus network. The sum of these P values might be much greater than 1, and the sum will differ from gene to gene, which may be undesirable. One possible solution to this problem would be to use the SH-test or KH-test to identify a set of plausible trees and then to normalize the corresponding P values so that the sum of the tree weights for each gene was equal across genes.
Consensus networks, as originally described, do not conform with the conventional biological interpretation of edge lengths (Morrison 2005
). Usually, the length of an edge in a phylogenetic tree represents the genetic distance between two nodes, whereas, the length of an edge in a standard consensus network is proportional to the number of input trees that display the particular edge. The implementation of consensus networks in SplitsTree4 (Huson and Bryant 2005
) offers various options in regard to the edge lengthsedges of the network representing a certain split can have their length set to the minimum, median, or average length of the edges representing that split in the input trees. The support values for each edge (i.e., the number of trees in which they appear) can be displayed as numbers associated with the edge in the style of bootstrap values or by edge thickness. Here we have incorporated both edge weights and tree weights into the consensus network.
One difficulty in interpreting consensus networks, or more generally splits graphs, is that their internal nodes are not meant to represent inferred ancestral species; that is, in contrast to phylogenetic trees, they do not attempt to provide an explicit representation of evolutionary history. Thus, it might be useful to develop alternative approaches to combining (rooted) trees into networks such as those described in, for example, Baroni, Semple, and Steel (2004)
. It might also be interesting to see whether the approaches described in this paper for consensus networks can be extended to super networks (Huson et al. 2004
), networks that are similar in spirit to consensus networks but allow for partial data, that is, some genes are not available for all taxa.
In conclusion, we have presented some new approaches for incorporating additional phylogenetic information in the construction of consensus networks. Using three data sets, we have demonstrated that the resulting methods allow us to form hypotheses about whether conflicting signals are due to, for example, rate heterogeneity among lineages causing long-branch attraction, small distances between sequences causing lack of resolution, or systematic biases. We believe that consensus networks that incorporate the information available from analysis of individual genes, both tree weights and edge weights, will provide a useful tool for exploring the issues arising in the rapidly expanding field of genome-scale phylogeny.
| Acknowledgements |
|---|
|
|
|---|
We wish to thank Michael Charleston and David Penny for their constructive comments on the manuscript and Patrick Forterre and the Institut Pasteur for their hospitality toward L.S.J. We thank Vadim Goremykin and Antonis Rokas for providing us with their alignments. B.R.H. acknowledges funding from the New Zealand Foundation for Research Science and Technology. This research was partly funded by a Discovery Grant (DP0453173) from the Australian Research Council. This is research paper number 016 from the Sydney University Biological Informatics & Technology Centre.
| Footnotes |
|---|
Laura Katz, Associate Editor
| References |
|---|
|
|
|---|
Bandelt, H.-J. 1995. Combination of data in phylogenetic analysis. Plant Syst. Evol. Suppl. 9:355361.
Baroni, M., C. Semple, and M. Steel. 2004. A framework for representing reticulate evolution. Ann. Comb. 8:391408.[CrossRef]
Buckley, T. R. 2002. Model misspecification and probabilistic tests of topology: evidence from empirical data sets. Syst. Biol. 51:509523.[CrossRef][Web of Science][Medline]
Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783791.[CrossRef][Web of Science]
Goremykin, V. V., K. I. Hirsch-Ernst, S. Wölfl, and F. H. Hellwig. 2003. Analysis of the Amborella trichopoda chloroplast genome sequence suggests that Amborella is not a basal angiosperm. Mol. Biol. Evol. 20:14991505.
. 2004. The chloroplast genome of Nymphaea alba: whole-genome analyses and the problem of identifying the most basal angiosperm. Mol. Biol. Evol. 21:14451454.
Goremykin, V. V., B. Holland, K. I. Hirsch-Ernst, and F. H. Hellwig. 2005. Analysis of Acorus calamus chloroplast genome and its phylogenetic implications. Mol. Biol. Evol. 22:18131822.
Hardy, M. P., C. M. Owczarek, L. S. Jermiin, M. Ejdebäck, and P. J. Hertzog. 2004. Characterization of the type I interferon locus and identification of novel genes. Genomics 84:331345.[CrossRef][Web of Science][Medline]
Hillis, D. M., J. P. Huelsenbeck, and D. L. Swofford. 1994. Hobgoblin of phylogenetics? Nature 369:363364.[CrossRef][Medline]
Ho, S. Y. W., and L. S. Jermiin. 2004. Tracing the decay of the historical signal in biological sequence data. Syst. Biol. 53:623637.
Holland, B. R., F. Delsuc, and V. Moulton. 2005. Visualising conflicting evolutionary hypotheses in large collections of trees: using consensus networks to study the origins of placentals and hexapods. Syst. Biol. 54:6676.
Holland, B. R., K. T. Huber, V. Moulton, and P. J. Lockhart. 2004. Using consensus networks to visualize contradictory evidence for species phylogeny. Mol. Biol. Evol. 21:14591461.
Holland, B., and V. Moulton. 2003. Consensus networks: a method for visualising incompatibilities in collections of trees. Pp. 165176 in G. Benson and R. Page, eds. Algorithms in bioinformatics. Springer-Verlag, Berlin.
Huber, K. T., M. Langton, D. Penny, V. Moulton, and M. Hendy. 2002. Spectronet: a package for computing spectra and median networks. Appl. Bioinform. 1:159161.
Huelsenbeck, J. P., B. Larget, R. E. Miller, and F. Ronquist. 2002. Potential applications and pitfalls of Bayesian inference of phylogeny. Syst. Biol. 51:673688.[CrossRef][Web of Science][Medline]
Huson, D. H., and D. Bryant. 2005. Estimating phylogenetic trees and networks using SplitsTree4. (http://www.splitstree.org).
Huson, D. H., T. Dezulian, T. Kloepper, and M. Steel. 2004. Phylogenetic super-networks from partial trees. IEEE/ACM Trans. Comput. Biol. Bioinform. 1:151158.
Jermiin, L. S., S. Y. W. Ho, F. Ababneh, J. Robinson, and A. W. D. Larkum. 2004. The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst. Biol. 53:637643.
Jermiin, L. S., G. J. Olsen, K. L. Mengersen, and S. Easteal. 1997. Majority-rule consensus of phylogenetic trees obtained by maximum-likelihood analysis. Mol. Biol. Evol. 14:1296l302.[Web of Science]
Jones, D. T, W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275282.
Kishino, H., and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J. Mol. Evol. 29:170179.[CrossRef][Web of Science][Medline]
Kishino, H., T. Miyata, and M. Hasegawa. 1990. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J. Mol. Evol. 30:151160.
Morrison, D. 2005. Networks in phylogenetic analysis: new tools for population biology. Int. J. Parasitol. 35:567582.[CrossRef][Web of Science][Medline]
Pagel, M., and A. Meade. 2004. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. 53:571581.
Phillips, M. J., F. Delsuc, and D. Penny. 2004. Genome-scale phylogeny and the detection of systematic biases. Mol. Biol. Evol. 21:14551458.
Poladian, L., and L. S. Jermiin. 2006. Multi-objective evolutionary algorithms and phylogenetic inference with multiple data sets. Soft Comput. 10:359368.[CrossRef]
Posada, D., and K. A. Crandall. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14:817818.
Rokas, A., B. L. Williams, N. King, and S. B. Carroll. 2003. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425:798803.[CrossRef][Medline]
Schmidt, H. A., K. Strimmer, M. Vingron, and A. von Haeseler. 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502504.
Seo, T.-K., H. Kishino, and J. L. Thorne. 2005. Incorporating gene-specific variation when inferring and evaluating optimal evolutionary tree topologies from multilocus sequence data. Proc. Natl. Acad. Sci. USA 102:44364441.
Shimodaira, H., and M. Hasegawa. 1999. Multiple comparison of log-likelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16:11141116.[Web of Science]
Stefanovic, S., D. W. Rice, and J. D. Palmer. 2004. Long branch attraction, taxon sampling, and the earliest angiosperms: Amborella or monocots? BMC Evol. Biol. 4:35.[CrossRef][Medline]
Strimmer, K., and A. Rambaut. 2002. Inferring confidence sets of possibly misspecified gene trees. Proc. R. Soc. Lond. B 269:137142.[Medline]
Swofford, D. L. 2002. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer Associates, Sunderland, Mass.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. E. Roberts, E. J. Sargis, and L. E. Olson Networks, Trees, and Treeshrews: Assessing Support and Identifying Conflict with Multiple Loci and a Problematic Root Syst Biol, June 16, 2009; (2009) syp025v3. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. J. Maureira-Butler, B. E. Pfeil, A. Muangprom, T. C. Osborn, and J. J. Doyle The Reticulate History of Medicago (Fabaceae) Syst Biol, June 1, 2008; 57(3): 466 - 482. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gatesy, R. DeSalle, and N. Wahlberg How Many Genes Should a Systematist Sample? Conflicting Insights from a Phylogenomic Matrix Characterized by Replicated Incongruence Syst Biol, April 1, 2007; 56(2): 355 - 363. [Full Text] [PDF] |
||||
![]() |
B. Holland, G. Conner, K. Huber, and V. Moulton Imputing Supertrees and Supernetworks from Quartets Syst Biol, February 1, 2007; 56(1): 57 - 67. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Ane, B. Larget, D. A. Baum, S. D. Smith, and A. Rokas Bayesian Estimation of Concordance among Gene Trees Mol. Biol. Evol., February 1, 2007; 24(2): 412 - 426. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





