Skip Navigation


MBE Advance Access originally published online on November 21, 2006
Molecular Biology and Evolution 2007 24(2):532-538; doi:10.1093/molbev/msl180
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/2/532    most recent
msl180v3
msl180v2
msl180v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Grünewald, S.
Right arrow Articles by Moulton, V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Grünewald, S.
Right arrow Articles by Moulton, V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

QNet: An Agglomerative Method for the Construction of Phylogenetic Networks from Weighted Quartets

Stefan Grünewald*, Kristoffer Forslund{dagger}, Andreas Dress*,{ddagger} and Vincent Moulton§

* Chinese Academy of Science-Max Planck Society Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
{dagger} Stockholm Bioinformatics Center, Albanova, Stockholm University, Stockholm, Sweden
{ddagger} Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
§ School of Computing Sciences, University of East Anglia, Norwich, United Kingdom

E-mail: stefan{at}picb.ac.cn.


    Abstract
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
 
We present QNet, a method for constructing split networks from weighted quartet trees. QNet can be viewed as a quartet analogue of the distance-based Neighbor-Net (NNet) method for network construction. Just as NNet, QNet works by agglomeratively computing a collection of circular weighted splits of the taxa set which is subsequently represented by a planar split network. To illustrate the applicability of QNet, we apply it to a previously published Salmonella data set. We conclude that QNet can provide a useful alternative to NNet if distance data are not available or a character-based approach is preferred. Moreover, it can be used as an aid for determining when a quartet-based tree-building method may or may not be appropriate for a given data set. QNet is freely available for download.

Key Words: phylogenetic tree • phylogenetic network • split network • Neighbor-Net • SplitsTree • quartet


    Introduction
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
 
Phylogenetic networks are a generalization of phylogenetic trees that allow the representation of conflicting signals or alternative evolutionary histories in a single diagram. Networks should be used if the underlying history is not treelike. For example, recombination, hybridization, and gene transfer can all lead to histories that are not adequately represented by a single tree. Moreover, even when the history is treelike, parallel evolution, model heterogeneity, and sampling error can make it hard to find a unique tree. In such cases, networks provide a tool for representing ambiguity or for visualizing a collection of feasible trees (Bryant and Moulton 2004Go; Huson and Bryant 2006Go).

There are a number of methods for constructing various kinds of phylogenetic networks, see Posada and Crandall (2001)Go and Morrison (2005)Go for reviews. Most of these methods may be roughly divided into 2 classes: those that construct networks directly from character data, for example, netting (Fitch 1997Go), statistical parsimony (Templeton et al. 1992Go), median networks (Bandelt et al. 1995Go, 1999Go; Huber et al. 2002Go), and recombination networks (Gusfield et al. 2004Go; Song and Hein 2005Go), and those that construct networks from distance data, for example, split decomposition (Bandelt and Dress 1992; Huson 1998Go), T-REX (Makarenkov 2001Go), Neighbor-Net (NNet) (Bryant and Moulton 2004Go), and TOM-networks (Willson 2006Go). In general, distance-based methods are often computationally more efficient than character-based methods as—en route to trees or networks—just deriving a distance matrix from sequence or character data already leads to a substantial reduction of data complexity.

Here we present a method dubbed "QNet" for computing a special kind of phylogenetic network called a "split network" from a collection of weighted "quartet" trees. QNet can be viewed as compromise between distance- and character-based network construction approaches because (possibly expensive) character-based tree-building methods may be used to derive weighted quartet trees from which, in turn, relationships between the full collection of taxa can then be efficiently deduced, see, for example, Colonius and Schulze (1977), Bandelt and Dress (1986)Go, and von Haeseler (1988)Go.

It should also be noted in this context that some data analysis methods produce "genuine" quartet data—see, for example, Weyer-Menkhoff et al. (2005)Go where "rate matrices" accounting for amino acid mutation rates in mitochondrial protein sequences have been used to derive such genuine quartet data.

Indeed, based upon this philosophy, various methods have been concocted for constructing phylogenetic trees from quartets, including Tree-Puzzle (Strimmer and von Haeseler 1996Go), Addquart (Berry and Gascuel 2000Go), quartet cleaning (Berry et al. 1999Go), a dynamic programing approach (Ben-Dor et al. 1998Go), and a linear programming method (Weyer-Menkhoff et al. 2005Go).

QNet can be viewed as a quartet analogue of the distance-based NNet method for constructing planar split networks (Bryant and Moulton 2004Go). As we shall explain below, QNet works in a fashion similar to NNet by agglomeratively computing a collection of so-called circular weighted splits of the taxa set that is subsequently represented by a planar split network. This can provide a useful alterative to NNet when distance data are not available, or a character-based approach is preferred. Moreover, it can be helpful in determining whether a quartet-based tree-building method (such as one of those mentioned above) might be appropriate for a given data set or not.


    Methods
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
 
Before presenting the QNet method in detail, we briefly summarize the way in which it works: QNet takes as input each of the 3 possible resolved quartet trees on each subset of 4 taxa, each having a weight that we assume to be precomputed using whatever method the user wants to apply. Using this information, QNet agglomeratively computes a "circular ordering" of the taxa to which a collection of "bipartitions" or "splits" of the taxa is canonically associated. This determines the topology of the QNet network. A least-squares procedure is then applied to estimate the length of the network's edges.

Compatible and Circular Split Systems
A (proper) split of a given set X of taxa is a bipartition of the set X into 2 (nonempty) subsets or "parts." A split is called "trivial" if one of its 2 parts contains one taxon, only. Splits naturally arise in the context of (unrooted) phylogenetic or, more generally, X–labeled trees because the removal of any edge of such a tree T leads to a split of X into the 2 disjoint and nonempty (!) subsets of X that are contained in the label set of either one of the 2 connected components of the resulting X-labeled forest (cf. Semple and Steel [2003]Go for more details). Any collection of splits that can be derived in this way from a fixed X–labeled tree is called a "compatible" split system (for X).

In order to display more complex evolutionary patterns, we employ a generalization of compatible split systems called circular split systems. To define such split systems, define first a circular ordering of X to be a cycle C = (X, E) with vertex set X and edge set E (in technical terms, a cycle is a "connected" and "2-regular" graph. Note that, given a set X of cardinality n ≥ 3, there exist exactly (n1)!/2 distinct circular orderings of X (and none if n ≤ 2). Note also that, in the context of the "Traveling Salesman's Problem," such a circular ordering is often also called a "tour"). Using this concept, we define a collection S of splits of X to be a circular split system (for X) if there exists a circular ordering C of X such that, for every split A|B isin S, there are 2 edges e, e' of C such that A and B are the vertex sets of the 2 components of the graph obtained from C by removing e and e'.

To visualize circular split systems, we can also make use of the fact that a cycle C can be embedded into the plane as the boundary of a convex polygon. Then, every pair {e, e'} of edges that corresponds to an element of the split system can be visualized as a straight line that cuts the edges e and e' (see fig. 1).


Figure 1
View larger version (6K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— A circular split system consisting of the splits {1, 2, 3, 4}|{5, 6, 7, 8, 9, 10}, {5, 6, 7, 8}|{9, 10, 1, 2, 3, 4}, and {1, 2, 3, 4, 5, 6}|{7, 8, 9, 10}.

 
Note that, given a circular ordering (X, E) of X, the unique largest circular split system Formula resulting from a circular ordering (X, E) of X contains exactly Formula distinct splits.

Note also that compatible split systems are always circular and that a split system is "weakly compatible" if (and only if) all of its 3 subsets are circular (Bandelt and Dress 1992aGo). Hence, the class of circular split systems encompasses the class of split systems corresponding to a tree.

Weighted Split Systems
For edge-weighted phylogenetic trees, it is natural to assign a weight to each split equal to the length of the corresponding branch in the tree and the weight 0 to every other split of X.

More generally, any "assignment" {Sigma} that associates a nonnegative real number to each split of a given set X is called a "weighted split system" (for X). It is called a "compatible weighted split system" (for X) if it is derived from a weighted phylogenetic tree. And it is called a "circular weighted split system" (for X) if its "support" (i.e., the set of splits S with {Sigma}(S) != 0) is a circular split system (for X).

A circular weighted split system is therefore completely determined by 1) a circular ordering (X, E) of X and 2) a map that assigns a nonnegative weight to every split in S(X, E) (as it assigns the weight 0 to all other splits of X).

Planar Split Networks
Weighted circular split systems can always be visualized by a special kind of phylogenetic network called a "planar split network," in which every split corresponds to a collection of parallel edges all of which are of the same length proportional to the weight of the corresponding split (Dress and Huson 2004Go)—see also Bryant and Moulton (2004)Go for an elementary explanation. Split networks were introduced by Bandelt and Dress (see, e.g., Bandelt and Dress 1992bGo) and are commonly used to represent incompatible and ambiguous phylogenetic signals in a data set (Huson and Bryant 2006Go). Such networks can be generated from weighted circular split systems using SplitsTree4 (Huson and Bryant 2006Go), see also (Bryant and Dress 2006Go) for more general types of split systems.

Realizing Quartets
A quartet is a split of a 4-taxa set into 2 parts each containing 2 distinct taxa. We denote the quartet on the taxa set a, b, c, and d that groups a, b versus c, d, by ab|cd. An arbitrary split of a given set of taxa is said to "display" a quartet ab|cd if a, b are in one part of that split and c, d in the other (e.g., see fig. 2). We say that a quartet is "realized" by a circular ordering if it is displayed by some split in the maximal circular split system corresponding to that ordering. A "quartet weight function" is a function that assigns a nonnegative weight to each possible quartet. In this terminology, the input to QNet is a quartet weight function that is assumed to be precomputed by the user.


Figure 2
View larger version (5K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— A phylogenetic tree and a planar split network where the bold edges correspond to the splits that display the quartet 13|56 ({1, 2, 3, 4}|{5, 6} in the tree and {1, 2, 3, 4}|{5, 6}, {1, 3, 4}|{2, 5, 6}, and {1, 3}|{2, 4, 5, 6} in the network).

 
A weighted split system naturally gives rise to a quartet weight function: to each quartet, we assign the sum of the weights of all those splits that display the quartet. We call a quartet weight function "circular" if it can be obtained from a weighted circular split system in this way. It can be shown that, for a circular quartet weight function and for every 4 taxa a, b, c, and d, at least one of the weights of the 3 possible quartets ab|cd, ac|bd, and ad|bc has to vanish. If the input quartet weight function does not have this property, we subtract the smallest of the 3 possible weights on each quartet from each of their 3 weights, in order to obtain a function that does.

Because trivial splits do not display any quartet, the quartet weight function obtained from a weighted split system does not depend on the weights of the trivial splits. Hence, a method that reconstructs split systems from quartet data cannot calculate appropriate weights for the trivial splits. To make sure that the trivial splits are easy to see in the final split network, QNet associates the average weight of all nontrivial splits in the system to every trivial split.

Constructing a Cyclic Ordering
First, we recall that given a natural number k ≥ 0, a "path of length" k is defined to be a connected graph (V, E) having a vertex set V of cardinality k + 1 and an edge set E of cardinality k such that no vertex in that graph has a degree larger than 2, that is, a path is a "highly degenerate" tree with exactly k + 1 vertices without any "branching points." In consequence, any path of length k ≥ 1 has exactly 2 vertices of degree 1 which are also called the "ends" of the path, whereas all others vertices have degree 2. Note that, given a path (V, E) of length k and 2 distinct vertices u, v isin V, the smallest subgraph of (V, E) containing the vertices u and v is a path of length k'' ≤ k whose 2 ends are the vertices u, v. Note also that adding to a path of length k ≥ 3, the unique edge that connects its 2 ends yields a circular ordering of V, whereas connecting any 2 ends (or, more generally, any 2 vertices of degree at most 1) of 2 disjoint paths (V1, E1) and (V2, E2) of length k1 and k2, respectively, by an edge yields a paths of length k1 + k2 + 1.

The main idea behind QNet is to find a circular quartet weight function that is as similar as possible to the input quartet weight function. Because every quartet that is not realized has weight 0, QNet tries to choose a cyclic ordering of the taxa for which the sum of the weights of all realized quartets is maximal. However, the corresponding optimization problem is NP-hard (Grünewald S, Moulton V, unpublished data). Therefore, QNet uses a heuristic agglomeration process that we describe now:

QNet employs an agglomeration procedure that starts with a graph G0 with vertex set a set X of taxa of cardinality n ≥ 4 that has no edges, allowing us to consider every taxon as a path of length 0. It then iteratively constructs a sequence of graphs G1 – (X, E1), G2 = (X, E2),..., Gn–1 = (X, En–1) with the same vertex set X by adding one new edge at a time to the current graph G{nu} so that the resulting graph G{nu}+1 remains a disjoint collection of paths. The collection of paths whose disjoint union forms the graph G{nu} will be denoted by P{nu}. The process ends when a graph Gn–1 is obtained that consists of a single path. By adding the one edge that joins its 2 ends, this determines a circular ordering of the taxa set X (e.g., see fig. 3).


Figure 3
View larger version (6K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— The initial (a), an intermediate (b), and the terminal state (c) of an agglomeration procedure and the obtained cyclic ordering (d).

 
Note that a quartet ab|cd is realized by every circular ordering (X, E) of the taxa set X that contains one such intermediate graph G{nu} = (X, E{nu}) if and only if either a and b are connected by a path in G{nu} that does not pass through either c or d, or c and d are connected by a path in G{nu} that does not pass through either a or b.

This process is clearly determined by the way in which the 2 taxa are selected that are to be joined by a new edge at each stage. The selection is made in 2 steps. In the first step, 2 paths are chosen that will be joined next. For any 4 distinct paths P1, P2, P3, P4 isin P{nu} from the intermediate graph G{nu}, let w(P1, P2|P3, P4) be the average weight of all quartets p1p2|p3p4 with p1,...,p4 representing taxa in the paths p1,...,p4, respectively. We then choose 2 paths P1 and P2 for which

Formula
is maximal where the sum is taken over all pairs P3, P4 of distinct paths that are also distinct from P1 and P2. If there are at most 3 paths left, then all sums vanish and an arbitrary pair of paths is selected. Note that, for phylogenetic trees, the quartet selection criterion is equivalent to the neighbor-joining criterion in case the quartets are appropriately weighted (Pachter and Sturmfels 2005, p. 77).

In the second step, one of the (at most 2) ends v1 of P1 and v2 of P2 is chosen that maximizes the sum over the weights of all quartets x1x2|x3x4 for which x1 is in P1, x2 is in P2, x3 is in P1 or P2, and neither x3 nor x4 are on the path from x1 to v1 in P1 or on the path from x2 to v2 in P2. This maximizes the sum of the weights of all quartets that are realized by every cyclic ordering that can be obtained by further joining the new collection of paths obtained by joining P1 and P2 by adding one edge connecting v1 and v2.

Least-Squares Estimation of Split Weights
In a fashion similar to NNet, after a cyclic ordering (X, E) of the taxa set X has been computed, QNet employs a nonnegative least-squares (NNLS) algorithm to calculate a weight for each of the splits in S = S(X, E) as follows: let Q = Q(X) denote the set of all quartets that can be formed using the elements of the taxa set X, and let Formula be the |Q| x |S| matrix, with rows indexed by Q and columns indexed by S, and entries aQ,S = 1 if Q is displayed by the split S in S and aQ,S = 0 else. Then, for the input quartet weight function w, considered as a vector in RQ, QNet computes the nonnegative vector l in RS that minimizes the Euclidian norm of the vector Alw. This solution is unique as the matrix A has full rank |S| because for every split S isin S, there is a quartet Q isin Q that is realized by S and by no other split in S. In practice, the solution is obtained using the non-negative least-squares (NNLS) algorithm the details of which can be found in (Lawson and Hanson 1974Go, p. 160–165).

Consistency and Efficiency of QNet
An important property that any algorithm for reconstructing phylogenies should satisfy is that it is "consistent," that is, that if a certain tree or network is used to generate the input for that algorithm, the algorithm should return precisely this tree or network.

Just as NNet, the design of the QNet algorithm was in part dictated by insisting that it should be consistent, that is, that if the input quartet weight function to QNet is circular, then the output should be the (up to trivial splits) unique weighted circular split system corresponding to this weight function. The proof that this is indeed the case is quite long and technical and is given in Grünewald S, Moulton V, (unpublished data). As a consequence, it follows that if a quartet weight function corresponding to an edge-weighted phylogenetic tree is taken as input to QNet, then QNet will return the splits and branch weights corresponding to this tree, excluding pendant edges.

Concerning efficiency, the QNet agglomeration procedure is O(n4) in time and space for a set consisting of n taxa. For the least-squares estimation algorithm, if negative split weights are allowed, the running time is O(n6). However, the NNLS algorithm has a superpolynomial worst case running time. Nevertheless, as with NNet (which also employs a nonnegativity constraint) the constrained least-squares algorithm tends to terminate quite quickly in practice, and we have been able to analyze data sets containing up to 120 taxa. For example, it took approximately 3.5 min on a PC with an AMD Athlon XP processor with 1.67 Ghz to compute a QNet on 64 taxa. Moreover, even though the matrix A described above has O(n6) entries, we reduced the required storage space for the least-squares procedure to O(n4) by calculating ATA directly (Grünewald S, Moulton V, unpublished data).

Implementation
QNet has been implemented in java and is available for download at http://www.cmp.uea.ac.uk/~vlm/qnet. The output of QNet is a weighted circular split system in NEXUS format. SplitsTree4 (Huson and Bryant 2006Go) can be used to produce a split network from the QNet output file.


    Results
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
 
To demonstrate the applicability of QNet, we present an analysis of the Salmonella data set presented in Kotetishvili et al. (2002)Go that was originally analyzed using split decomposition and reanalyzed in Bryant and Moulton (2004)Go using NNet.

We computed quartet weights using default parameters in the Tree-Puzzle program (Strimmer and von Haeseler 1996Go). This computes a posterior likelihood for each quartet using the Hasegawa–Kishino–Yano model of substitution (Hasegawa et al. 1985Go). In figure 4, we present the associated "likelihood mapping"—a visualization of the phylogenetic content of a sequence alignment in which each point represents a weighted quartet (Strimmer and von Haeseler 1997Go)—and the Tree-Puzzle tree.


Figure 4
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— The likelihood mapping and the Tree-Puzzle tree for the Salmonella data set cited in the text.

 
From the likelihood mapping, we see that the majority of quartets are mapped close to the 3 vertices of the triangle that, at first glance, seems to suggest a tree-like behavior. However, the Tree-Puzzle tree is only moderately resolved indicating that there are many conflicts in the data. In particular, 4 high scoring splits—those contained in over 25 percent of the 10,000 trees used to generate the Tree-Puzzle tree—conflict with the Tree-Puzzle tree, and there are 3 more pairs of conflicting high scoring splits.

As input for QNet, we used 2 different methods to compute quartet weights: 1) Using Tree-Puzzle, we computed the posterior likelihood and the maximum likelihood branch length of the interior edge of every quartet and took the quartet weight to be the product of these 2 numbers. These weights have the attractive property that, under the maximum likelihood model, they converge to the true branch length as the sequence length grows (Bryant D, personal communication). We refer to these weights as the "expected branch lengths." 2) We computed quartet weights by applying the extension of the statistical geometry method described in Nieselt-Struwe and von Haeseler (2001Go, p. 1206). For short, we call these the "geometric weights."

In figure 5, we present an NNet and 2 QNets for the Salmonella data set. The NNet in (a) was computed using the distance matrix described in Bryant and Moulton (2004)Go, the QNet in (b) was computed using expected branch lengths, and the QNet in (c) was derived from geometric weights. All 3 networks exhibit the same major splits. Also, as expected from the analysis performed in Bryant and Moulton (2004)Go, all the networks indicate complex patterns of evolution. Note that, although the split with one part consisting of Sha146, Sha135, UND8, Sjo99, Sha151, and Sty19* occurs in 93% of the trees constructed by Tree-Puzzle, it is only represented in the network depicted in figure 5b and that, moreover, this network represents also a conflicting split of higher weight. This indicates that trying to represent data by a tree can result in high support for splits that are not supported if the restriction to trees is omitted.


Figure 5
View larger version (36K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 5.— An NNet (a), and 2 QNets, (b) and (c), for the Salmonella data set described in the text. The QNet in (b) was computed using expected branch lengths, and the QNet in (c) was computed using geometric weights.

 
The 2 QNets are quite similar to the NNet, although they both exhibit more splits. We found that this was the case for many of the data sets that we have analyzed so far (data not shown). As discussed in the methods section, in contrast to NNet, QNet does not allow the estimation of the length of pendant edges. This could explain why the major splits in NNet are comparatively shorter than the same splits in the QNets. Note also that the QNet computed using expected branch lengths in figure 5b displays less splits than the QNet in (c), which might indicate that the statistical geometry estimation procedure picks up more noise than the expected branch lengths procedure.

All the networks in figure 5 display a considerable number of low weight splits that could be related to the amount of noise in the data. To deal with this phenomenon, we tried filtering out all splits displayed by a split network for which there is a conflicting split represented that has a much higher weight (we took 10 times the weight). This method has the attractive property that if it is applied to a tree, the tree will remain unchanged. We applied this method to the networks in figure 5b and c; the resulting networks are presented in figure 6. We see that the number of splits decreases from 123 to 82 for figure 6a and from 159 to 97 for figure 6b. Even so, the major splits and the main conflicts are still displayed after the filtering.


Figure 6
View larger version (20K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 6.— The QNets in figure 5b and c after a reduction procedure was applied as described in the text.

 

    Discussion
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
 
We have introduced and implemented a new method called QNet for inferring phylogenetic networks. As with NNet, QNet outputs circular split systems, although it infers these from weighted quartet trees as opposed to distances.

To demonstrate QNet's applicability and some of its properties, we applied it to a Salmonella data set originally appearing in Kotetishvili et al. (2002)Go. We found that QNet produced similar results to NNet but that it tended to infer more splits than NNet. As expected, we found that QNet clearly depends on the method used to compute quartet weights although, for the 2 methods we applied (expected branch lengths and geometric weights), the main splits inferred by QNet were the same. We also found that QNet was able to determine incompatibilities in quartet data that were not apparent in the associated likelihood mapping (cf. fig. 4), a fact that might explain why the Tree-Puzzle tree exhibited, in spite of a reasonably well-looking likelihood mapping, an unexpected degree of "unresolvedness." Thus, QNet may be useful for determining whether quartet-based tree-building methods may be appropriate for a given data set.

QNet has various useful features. As with all network methods, QNet does not force the data onto a tree. In addition, QNet is consistent: if quartets corresponding to a circular split system are used as an input into QNet, it will reproduce this split system. In particular, if QNet is applied to the quartet weights corresponding to a tree, then QNet will produce that tree. We therefore expect that QNet could provide a useful alternative to NNet when distance data are not available. Moreover, because reduction of data complexity in terms of quartets is probably more conservative than in terms of distances, QNet could also provide a useful complementary method to NNet if both, quartet and distance data, are available. In regards to this, we note that QNet uses all quartet information, as opposed to some tree-building methods such as Tree-Puzzle and Addquart that make selective (and input order dependent) use of the quartet data they generate.

QNet does have some drawbacks. As with NNet, QNet produces outerplanar split networks which, even though they are easier to visualize, might not be suitable for high dimensional data (Bryant and Moulton 2004Go). In addition, for n taxa, QNet runs in O(n4) in memory and O(n6) in time if nonconstrained least squares is employed. Thus, QNet is probably not appropriate for very large data sets (for which it might anyhow be difficult to obtain a good visualization in terms of networks). Moreover, as with all pure quartet methods, QNet does not allow the estimation of the length of pendant edges and could profit considerably from improved strategies for computing quartet weights (Felsenstein 2004Go, Chap. 12).

In conclusion, QNet provides a flexible and user-friendly method to analyze quartet data. This should provide a useful complementary approach to methods such as Tree-Puzzle and NNet, and—by breaking down collections of phylogenetic trees into quartets—might also be used as the basis for new methods involving the construction of supertrees and supernetworks.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
 
The authors would like to thank the Linnaeus Centre for Bioinformatics where they began to discuss the QNet approach. The first and last author would also like to thank the Mathematics and Statistics Department, University of Canterbury, and the Allan Willson Centre for Molecular Ecology and Evolution for providing them with the opportunity to work on this paper in New Zealand. V.M. was supported in part by Engineering and Physical Sciences Research Council grant EP/D068800/1.


    Footnotes
 
William Martin, Associate Editor.


    References
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Bandelt H-J and Dress A. (1986) Reconstructing the shape of a tree from observed similarity data. Adv Appl Math 7:309–343.[CrossRef]

    Bandelt H-J and Dress A. (1992a) Split decomposition: a new and useful approach to phylogenetic analysis of distance data. Mol Phylogenet Evol 1:242–252.[CrossRef][Medline]

    Bandelt H-J and Dress A. (1992b) A canonical decomposition theory for metrics on a finite set. Adv Math 92:47–105.[CrossRef]

    Bandelt H, Forster P, Röhl A. (1999) Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol 16:37–48.[Abstract]

    Bandelt H, Forster P, Sykes B, Richards M. (1995) Mitochondrial portraits of human population using median networks. Genetics 141:743–753.[Abstract]

    Ben-Dor A, Chor B, Graur D, Ophir R, Pelleg D. (1998) Constructing phylogenies from quartets: elucidation of eutherian superordinal relationships. J Comp Biol 5:377–390.

    Berry V and Gascuel O. (2000) Inferring evolutionary trees with strong combinatorial evidence. Theor Comp Sci 240:271–298.[CrossRef]

    Berry V, Jiang T, Kearney P, Li M, Wareham T. (1999) Quartet cleaning: improved algorithms and simulations. In Nesetril J (Ed.). Proceedings of the 7th European Symposium on Algorithm (ESA99), Prague, Czech Republic(Springer, New York (NY)) pp. 313–324.

    Bryant D and Dress A. (2006) Linearly independent split systems. Europ J Comb. 10.1016/j.ejc.2006.04.007.

    Bryant D and Moulton V. (2003) Consistency of the Neighbor-net algorithm for constructing phylogenetic networks(Montreal, McGill University, School of Computer Science).

    Bryant D and Moulton V. (2004) Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol Biol Evol 21:255–265.[Abstract/Free Full Text]

    Colonius H and Schulze H-H. (1977) Trees constructed from empirical relations(Braunschweig, Germany: Braunschweiger Berichte aus dem Institut fuer Psychologie 1).

    Dress A and Huson D. (2004) Constructing splits graphs. IEEE/ACM Trans Comput Biol Bioinform 1:109–115.[CrossRef]

    Felsenstein J. (2004) Inferring phylogenies(Sunderland (MA): Sinauer Associates, Inc).

    Fitch W. (1997) Networks and viral evolution. J Mol Evol 44:S65–S75.

    Gusfield D, Eddhu S, Langley C. (2004) Optimal, efficient reconstruction of phylogenetic networks with constrained recombination. J Bioinform Comput. Biol 2:173–213.

    Hasegawa M, Kishino H, Yano T. (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174.[CrossRef][Web of Science][Medline]

    Huber K, Langton M, Penny D, Moulton B, Hendy M. (2002) Spectronet: a package for computing spectra and median networks. Appl Bioinformatics 1:159–161.[Medline]

    Huson D. (1998) Splitstree—a program for analyzing and visualizing evolutionary data. Bioinformatics 14:68–73.[Abstract/Free Full Text]

    Huson D and Bryant D. (2006) Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23:254–267.[Abstract/Free Full Text]

    Kotetishvili M, Stine O, Kreger A, Morris J, Sulakvelidze A. (2002) Multilocus sequence typing for characterization of clinical and environmental salmonella strains. J Clin Microbiol 40:1626–1635.[Abstract/Free Full Text]

    Lawson CL and Hanson RJ. (1974) Solving least square problems. (Prentice Hall, Englewood Cliffs (NJ)).

    Makarenkov V. (2001) T-REX: reconstructing and visualizing phylogenetic trees and reticulation networks. Bioinformatics 17:664–668.[Abstract/Free Full Text]

    Morrison D. (2005) Networks in phylogenetic analysis: new tools for population biology. Int J Parasitol 35:567–582.[CrossRef][Web of Science][Medline]

    Nieselt-Struwe K and von Haeseler A. (2001) Quartet-mapping, a generalization of the likelihood-mapping procedure. Mol Biol Evol 18:1204–1219.[Abstract/Free Full Text]

    Algebraic statistics for computational biology. (2005) (Cambridge University PressIn Pachter L and Sturmfels B (Eds.). , Cambridge (UK)).

    Posada D and Crandall K. (2001) Intraspecific gene genealogies: trees grafting into networks. Trends Ecol Evol 16:37–45.[CrossRef][Medline]

    Semple S and Steel M. (2003) Phylogenetics. (Oxford University Press, Oxford (UK)).

    Song YS and Hein J. (2005) Constructing minimal ancestral recombination graphs. J Comput Biol 12:147–169.[CrossRef][Web of Science][Medline]

    Strimmer K and von Haeseler A. (1996) Quartet puzzling: a quartet maximum likelihood method for reconstructing tree toplogies. Mol Biol Evol 13:964–969.[Web of Science]

    Strimmer K and von Haeseler A. (1997) Likelihood mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc Natl Acad Sci USA 94:6815–6819.[Abstract/Free Full Text]

    Templeton A, Crandall K, Sing C. (1992) A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and dna sequence data. III. cladogram estimation. Genetics 132:619–633.[Abstract]

    von Haeseler A. (1988) Rekonstruktion phylogenetischer Bäume mit Hilfe von Varianten der Vier-Punkt-Bedingung [PhD thesis](Bielefeld, Germany, Universität Bielefeld).

    Weyer-Menkhoff J, Devauchelle C, Grossmann A, Grünewald S. (2005) Integer linear programming as a tool for constructing trees from quartet data. Comput Biol Chem 29:196–203.[CrossRef][Web of Science][Medline]

    Willson S. (2006) Unique reconstruction of tree-like phylogenetic networks from distances between leaves. Bull Math Biol 68:919–944.[CrossRef][Web of Science][Medline]

Accepted for publication November 15, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/2/532    most recent
msl180v3
msl180v2
msl180v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Grünewald, S.
Right arrow Articles by Moulton, V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Grünewald, S.
Right arrow Articles by Moulton, V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?