Skip Navigation


MBE Advance Access originally published online on July 20, 2006
Molecular Biology and Evolution 2006 23(10):1976-1983; doi:10.1093/molbev/msl065
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/10/1976    most recent
msl065v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Shi, X.
Right arrow Articles by Field, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Shi, X.
Right arrow Articles by Field, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Article

Test a Clade in Phylogenetic Trees

Xiaofei Shi, Hong Gu and Chris Field

Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada

E-mail: shi{at}mathstat.dal.ca.


    Abstract
 TOP
 Abstract
 Introduction
 The Rationale of the...
 Tree Test Methods for...
 Procedure
 Multiple P Values
 Simulation
 Real Data Analysis
 Discussion
 Acknowledgements
 References
 
We develop a new method for testing a portion of a tree (called a clade) based on multiple tests of many 4-taxon trees in this paper. This is particularly useful when the phylogenetic tree constructed by other methods have a clade that is difficult to explain from a biological point of view. The statement about the test of the clade can be made through the multiple P values from these individual tests. By controlling the familywise error rate or the false discovery rate (FDR), 4 different tree test methods are evaluated through simulation methods. It shows that the combination of the approximately unbiased (AU) test and the FDR-controlling procedure provides strong power along with reasonable type I error rate and less heavy computation.

Key Words: test • multiple comparison • constraint • clade • FDR


    Introduction
 TOP
 Abstract
 Introduction
 The Rationale of the...
 Tree Test Methods for...
 Procedure
 Multiple P Values
 Simulation
 Real Data Analysis
 Discussion
 Acknowledgements
 References
 
In making inferences about relationships in a phylogenetic tree, it is often of interest to know whether a subset of the species forms a clade or whether a particular species belongs to a specific clade. In the general and simpler case, the null hypothesis would be that two or more specific species form a clade in the tree. However the null hypothesis, H0 specifies neither the relationship among the remaining species out of the clade nor that within the clade when there are more than 2 species concerned, forcing us to deal with the issue of nuisance parameters in carrying out the test. In a slightly more restricted case, the null hypothesis could state that one or more specific species are members of a larger clade. In this case, we are interested in testing the membership of some but not all members of the clade.

We were motivated in this study by the example in Andersson and Roger (2002)Go, where the maximum likelihood (ML) tree of 46 eubacterial and eukaryotic homologs was constructed. The eukaryote species were placed into 2 distinct clades with 5 exceptions. Andersson and Roger (2002)Go were interested in testing whether any of the 5 exceptional eukaryotes could belong to 1 of the 2 eukaryote clades. In the following, we call a statement such that 1 of these 5 eukaryotes is in the same clade of 1 of the 2 eukaryotic groups a "constraint." This is the situation where we are not interested in testing the membership for all the members of one of the eukaryote but only want to test whether a particular species belongs to this clade. More details on this example are given in Real Data Analysis.

Currently, several hypothesis test methods are commonly used for inferences on trees. Most of the existing test methods are designed to test a fully fixed tree structure instead of a partially fixed structure, such as Shimodaira–Hasegawa (SH) test (Shimodaira and Hasegawa 1999Go), the approximately unbiased (AU) test (Shimodaira 2002Go), the star version of the SOWH (SSOWH) test (Antenaza 2003Go), the single distribution nonparametric (or parametric) bootstrap (SDNB or SNPB) test (Shi et al. 2005Go), and the generalized least squares (GLS) test (Susko 2003Go). It is possible to use these tests by fixing the unspecified part of the tree at the ML values obtained under the constraint of the null hypothesis. Andersson and Roger (2000) used this approach and applied the SH and SOWH tests to test the ML trees under the constraint about the 5 exceptional eukaryotes. The P values varied extensively in the SH test, depending on which additional candidate trees were included in the group of trees tested in the null hypothesis and typically exceeded 0.75. Although as the most conservative method, the SH test did not reject any tested topologies, the SOWH test rejected all tested topologies with very small P values. Even on ignoring the variant performance of the test methods, the application of using the constrained ML tree as the tested tree may not be appropriate because the constraint in the original hypothesis is replaced by the ML tree structure under the constraint. The rejection of the hypothesis may be due to the constraint itself or the other part of the structure of the constrained ML tree. In testing that any of these 5 eukaryotes is in the same clade of 1 of the 2 eukaryotic groups, only a partially fixed structure should be tested, not the concrete tree topology.

The use of bootstrap selection probability to assign the confidence level for constraints was mentioned by Felsenstein (1985)Go. The bootstrap selection probability is assessed as the "confidence" of each clade of an observed tree, based on the proportion of bootstrap trees showing the same clade. It was corrected to better agree with standard ideas of confidence levels and hypothesis testings by Efron et al. (1996)Go by considering the curvature of the boundary of the trees in order to have second-order accuracy. However, the method is not widely used due to its complexity.

In this paper, we develop a simple and valid method to test the partially fixed tree structures. Our approach will be to decompose the original hypothesis H0 into a number of individual hypotheses H0i, i = 1, ..., m, in which the tested trees have fixed structures due to the constraint, where m is the total number of individual hypotheses needed. Some existing test methods can then be adopted for these hypotheses to find P values, which are calculated as the probabilities, under H0i, of observing a value of the test statistic at least as large as that observed. A conclusion about the original hypothesis can be made through dealing with these P values. A constraint can include any combination of taxa in a tree. A true constraint is a true statement about a subset of taxa that constitute a clade of the true tree, and this clade can be isolated from the other part of the structure by breaking only one internal branch. For the 10-taxon tree in figure 1, the constraint that taxa 4 and 5 are in the same clade is true. However, because we must break 2 internal branches to isolate the clade of (4, 5, 6) from the whole structure, the corresponding constraint is false. The test of a constraint about a clade is called a constraint test.


Figure 1
View larger version (5K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— A 10-taxon tree. The tree structure with branch lengths is (1: 0.016185, 2: 0.019150, (3: 0.032104, ((4: 0.016474, 5: 0.020743): 0.034872, (6: 0.049471, (7: 0.063597, ((8: 0.024445, 9: 0.022501): 0.003587, 10: 0.026746): 0.036609): 0.005): 0.010505): 0.002312): 0.012872).

 
The rest of the paper is organized as follows. At first, the relationship between the original hypothesis of a constraint and a number of individual hypotheses is introduced. In each individual hypothesis, a 4-taxon tree is tested, which has fixed structure because 2 taxa are neighbors under the constraint. Then 4 test methods, namely, AU, GLS, SSOWH, and SDNB, are reviewed based on 4-taxon trees for these individual hypotheses. Two methods are used to cope with the multiple P values. One is the widely used false discovery rate (FDR)–controlling procedure (Benjamini and Hochberg 1995Go), the other is a chi-square approximation to a function of independent P values. Subsequently, all combinations of the tree test methods and the multiple test procedures are compared for some constraint tests on a simulated 10-taxon tree. These combinations are assessed based on type I error rate and power. As the best choice of the test method, the AU test is then applied on the 46 eubacterial and eukaryotic homolog data combined with multiple test procedures.


    The Rationale of the Test Procedure
 TOP
 Abstract
 Introduction
 The Rationale of the...
 Tree Test Methods for...
 Procedure
 Multiple P Values
 Simulation
 Real Data Analysis
 Discussion
 Acknowledgements
 References
 
We begin by considering the case where testing the whole clade is of interest. The modifications for the more restricted case follow the development for this case. Suppose in an n-taxon tree, a clade with n0 taxa is being tested. Then the remaining nn0 taxa will form another group naturally. The overall null hypothesis can be written as:

H0: The n0 (or the remaining nn0) taxa are in the same clade. (1)

First consider the simplest meaningful case where n0 = 2, that is, the 2 taxa are neighbors under H0. In other words, the 2 taxa are closer to each other than to any others. Under the null hypothesis, these 2 taxa will be neighbors for any 4-taxon subset that includes them as members. If H0 is false, there is at least one taxon existing in between them in the true tree and thus in some of the 4-taxon trees. Inversely, if 2 taxa are neighbors in all the possible Formula 4-taxon trees, they must be neighbors in the n-taxon tree because none of the other taxa can break the alliance between them. A simple example is given to demonstrate this idea. In figure 2, taxa 1 and 2 are neighbors in the 5-taxon tree. It implies that any 4-taxon tree including them should have them as neighbors, as shown in the second row. But the topologies in the second row can only tell that taxa 1 and 2 are neighbors in a 5-taxon tree. The topology among taxa 3, 4, and 5 is not of concern in the hypothesis.


Figure 2
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— An example for demonstrating the basic idea of the constraint test.

 
The situation is more complicated when n0 > 2 due to the unknown tree structure within this clade. The idea is still that the taxa within this clade are closer to each other than to taxa outside this clade. Thus, under H0, any 4-taxon tree that includes 2 taxa within the clade and the other 2 from outside the clade has a fixed structure with the former 2 taxa as neighbors. For the simplicity of description, we call the 2 taxa chosen from the tested clade in the null hypothesis as taxa 1 and 2 in a 4-taxon tree. They are neighbors under H0, as shown in figure 3. We denote this tree structure as {tau}1 and the other 2 possible structures of the 4 taxa as {tau}2 and {tau}3. There are Formula possible 4-taxon combinations to be tested as individual null hypotheses. Testing the original hypothesis (1) is equivalent to testing:


Figure 3
View larger version (4K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— The structure of the tested tree under H0i. Taxon 1 and 2 are chosen from inside the clade and the other two from outside the clade.

 
H0i: The ith 4-taxon tree has {tau}1 as its true topology, for all i = 1, ..., m. (2)

The original hypothesis (1) is accepted if all H0i (i = 1, ..., m) are accepted and rejected if at least one is rejected. Thus, testing the original hypothesis becomes a familywise error rate (FWER)–controlling multiple tests. Note that when at least one taxon outside the clade breaks the alliance of the taxa within the clade, there are often more than one individual hypothesis among H0i that should be rejected. Thus, the high power of such test procedure is expected.

For the more restricted case, the main difference lies in the choice of the 4-taxon trees to be tested. For example, if we have a clade of n0 species and we are interested in testing whether one specific species belongs to the clade, we would pair the specific species with each of the remaining n0 1 species in the clade so that Formula The formula for m would be Formula if we wanted to test the membership of 2 species in a clade. Because only the value of m is changed in this more restricted case, we continue the development in the simpler case of testing the complete clade.


    Tree Test Methods for Individual Hypotheses
 TOP
 Abstract
 Introduction
 The Rationale of the...
 Tree Test Methods for...
 Procedure
 Multiple P Values
 Simulation
 Real Data Analysis
 Discussion
 Acknowledgements
 References
 
Because the individual tests are the ordinary tests with a fixed tree structure under the hypothesis, the existing test methods can be applied to provide P values. Among these methods, the SH test is too conservative because the least favorable configuration assumed in the hypothesis corresponds to a specific hypothesis very difficult to reject; the Swofford-Olsen-Waddell-Hillis (SOWH) test and the single distribution parametric bootstrap (SDPB) test are acceptable only when there is no model misspecification (Shi et al. 2005Go). Here 4 other methods, namely, AU, GLS, SSOWH, and SDNB, are considered for the individual hypotheses. A simple review of these methods for a 4-taxon tree is given below.

The AU test (Shimodaira 2002Go) is widely used nowadays. Its hypothesis is stated in terms of the expected log-likelihood of topology {tau}i, denoted by µi, i = 1, 2, 3. The hypothesis for testing {tau}i is

Formula

The proportion of times that {tau}i is the ML tree in the bootstrap replicates is called bootstrap probability (BP), which is computed by a multiscaled nonparametric bootstrap in which different sequence lengths are used. Because it gives the first-order accurateness for the P value (Efron et al. 1996Go), Shimodaira used the results from Efron and Tibshirani (1998)Go to correct BP to give the second-order accurate P values, which means that the probability of the type I error of the AU test is {alpha} + O(N–3/2), where N is the sequence length (see eq. 11 and the discussion therein of Shimodaira 2002Go). The AU test controls the type I error strongly with relatively weak power. As an important feature, the accuracy of the P value in the AU test can be increased by increasing the number of bootstraps with feasible computational burden.

In the GLS test, the ML pairwise distances Y contains 6 elements that can be expressed as the sums of the branch lengths on the path between the pairs of taxa. The design matrix X expressing the tree topology in figure 3 is fixed in all the individual tests. The following equation shows the relationship between Y, X, and T under H0, where T is a vector of branch lengths estimated to minimize (YXT)TV–1(YXT), where V is the covariance matrix of the pairwise ML distances:

Formula

Because the ML pairwise distances are asymptotically multivariate normal distributed, under H0, (YXT)TV–1(Y XT) asymptotically follows a chi-square distribution with 1 degree of freedom (Susko 2003Go). By using this asymptotic result to avoid bootstrap, the GLS test has some computational advantage over the other methods, which makes it more desirable in settings with a large number of individual hypotheses.

In the SSOWH test, the test statistic is the difference of log-likelihood between the ML tree and {tau}1, whose distribution is constructed by a parametric bootstrap. The replicate data sets are simulated under a star topology with ML branch lengths estimated from the original observations. The replicate of the test statistic is the log-likelihood difference between the ML tree and the tested tree of the replicate data set. It is shown that with large number of taxa, the SSOWH test does not perform well when the underlying true tree topology is far away from the star topology (Shi et al. 2005Go). Because only 4-taxon trees are considered here, it is not a problem to treat the star topology as their boundary.

The same test statistic is used in the SDNB test. But its distribution is constructed by nonparametric bootstrap. According to the bootstrap theory, the bootstrap replicate of the test statistic is the log-likelihood difference between the ML tree of the replicate data set and the ML tree of the original data set with other free parameters estimated in the replicate data set. The SDNB test may raise problem about type I error, if there are short internal branches in the true tree.

The accuracy of the P values from the last 2 methods closely relates to the number of bootstraps. If the number of bootstrap is 100, the P value can only be accurate up to 2 decimals. The important issue about these 2 tests is the heavy computational burden to obtain high accuracy of the P values for each hypothesis and the large number of individual hypotheses.


    Procedure
 TOP
 Abstract
 Introduction
 The Rationale of the...
 Tree Test Methods for...
 Procedure
 Multiple P Values
 Simulation
 Real Data Analysis
 Discussion
 Acknowledgements
 References
 
We summarize the procedure of testing all the possible 4-taxon trees under the hypothesis as below:

  1. Suppose there are n0 taxa within the clade under constraint. Select 2 taxa from n0 taxa inside the clade as taxa 1 and 2 and the other 2 outside the clade and combine their sequences to form a sub–data set.
  2. Implement the chosen test method for corresponding H0i and record the P value.
  3. Go back to step 1 until all the 4-taxon combinations are considered. It means that we must test all of m combinations, where Formula

There are 3 advantages in this proposed test procedure. First, the tree topology need not be estimated for any of these 4-taxon trees. As a tested topology, it has major advantage over the ML tree under the constraint. Secondly, by using 4-taxon trees in individual hypotheses, the number of individual hypotheses is minimized; that is, Formula for any i, j ≥ 2. It reduces the problems in multiple comparisons and avoids intensive computations to the best extent. At last, based on the simulation results in Shi et al. (2005)Go, for the asymptotic results to be valid, trees with large number of taxa need long sequences in the AU and GLS tests and thus involve intensive computation. However, they work well for 4-taxon trees with relative short sequence lengths.


    Multiple P Values
 TOP
 Abstract
 Introduction
 The Rationale of the...
 Tree Test Methods for...
 Procedure
 Multiple P Values
 Simulation
 Real Data Analysis
 Discussion
 Acknowledgements
 References
 
After obtaining P values of individual hypotheses, methods for making statements on overall hypothesis are needed. Let H0 = {H01, ..., H0m} be a set of null hypotheses with corresponding test statistics T1, ..., Tm and P values P1, ..., Pm, where Formula H0 is accepted when none of H0i, i = 1, ..., m is rejected. The FWER is the probability of rejecting at least one of them when all m null hypotheses are true. As reviewed in Dudoit et al. (2003)Go, the strong control of FWER refers to the control of the false rejection rate under any combination of true and false null hypotheses, whereas weak control of FWER refers to the control of the false rejection rate when all the null hypotheses are true. In our situation, we only need to control the false rejection rate when all the null hypotheses are true, and we need to keep high power when there are nontrue null hypotheses. Thus, the weak control of FWER is needed here. We tested several procedures among the ones reviewed in Dudoit et al. (2003)Go to control FWER. For the single-step procedures, we used bootstrap method to estimate the sampling distribution of min P. The simulation results showed that this method is too conservative. Another more powerful procedure we tested is Hochberg's procedure (Hochberg 1988Go), which controls the FWER in the same way as the FDR-controlling procedure but is less powerful than the FDR procedure when there are nontrue hypotheses. We thus focus on the FDR-controlling procedure as the only multiple testing procedure here. Another procedure, the chi-square approximation procedure based on the assumption of independent P values, showed comparable results from the simulation and thus is also included here.

The FDR-Controlling Procedure
Benjamini and Hochberg (1995)Go proposed the idea to control the FDR instead of FWER in multiple tests. Consider the problem of testing simultaneously m null hypotheses with m0 of them being true. R is the number of hypotheses rejected and R0 is the number of false rejections among them. The FDR can be viewed as expectation of the proportion of false rejection of the null hypotheses:

Formula

Define Q = 0 when R = 0, as no false rejection can be committed. In the FDR-controlling procedure, P(1), ..., P(m) are the increasingly ordered P values for H0 = {H(01), ..., H(0m)}. All H(0i) for i = 1, ..., I are rejected with I being the largest i for which Formula where q* is the desired level of FDR.

This procedure controls the FDR for both of independent and positively dependent test statistics (Benjamini and Liu 1999Go; Benjamini and Yekutieli 2001Go). If all of the null hypotheses are true, there are only 2 possible values for Q: Q = 0 when R = 0 and Q = 1 when R ≥ 1. Then the FDR is equivalent to the FWER:

Formula

Therefore, controlling the FDR implies controlling the FWER. When the constraint is not true, the FDR-controlling procedure is more powerful than the general FWER-controlling procedure. The overall hypothesis is rejected if there is at least one H0i rejected or equivalently if I ≥ 1.

The Chi-Square Approximation Procedure
Generally, if the distribution of the test statistic is continuous, under the true null hypothesis, the P value is uniformly distributed between 0 and 1. Let Y = –logP, where P denotes the P value with a uniform distribution. Then, the probability density function of Y is

Formula
which is consistent with the density function of the gamma distribution {Gamma}(1, 1). From the following 2 properties:

(1) Y is distributed {Gamma}({alpha}, {lambda}) if, and only if, {lambda}Y is distributed {Gamma}({alpha}, 1).
(2) If Y1 and Y2 are independent random variables with {Gamma}({alpha}1, {lambda}) and {Gamma}({alpha}2, {lambda}) distributions, respectively, then Y = Y1 + Y2 has {Gamma}({alpha}1 + {alpha}2, {lambda}) distribution. It can be easily shown that the statistic Formula has a chi-square distribution with 2m degrees of freedom if the P values are independent.

Although the P values resulting from these 4-taxon trees may not be independent, we found their correlations are typically small in the simulation examples. Except when combined with GLS tree test method, it showed similar type I error and power as the FDR-controlling procedure when combined with other tree test methods. Note, when the bootstrap is used in calculating P values in the tree test methods (SOWH and SDPB), the P values can only be accurate up to some extent depending on the number of bootstraps. It is possible that the resultant P value is zero, which in fact means the P value is smaller than the smallest possible P value we would get from bootstraps. In such cases, theoretically it cannot be implemented because of the infinite logarithm issues in this method. However, in order to compare the result with FDR-controlling procedure, we replace the zero P value by the midpoint between 0 and the smallest possible P value from the tree tests in the following simulations. We reject the overall hypothesis if Formula is greater than the (1 – {alpha}) critical value of {chi}2(2m).


    Simulation
 TOP
 Abstract
 Introduction
 The Rationale of the...
 Tree Test Methods for...
 Procedure
 Multiple P Values
 Simulation
 Real Data Analysis
 Discussion
 Acknowledgements
 References
 
We used data simulated from a 10-taxon model tree by "Seq-Gen," version 1.2.5 (Rambaut and Grassly 1997Go), so that the true tree topologies are known.

Models
To make the simulation more realistic, one data set was simulated from a published 66-taxon tree (Murphy et al. 2001Go) with a sequence length of 3,000. Then 10 taxa were selected from the 66 taxa, and the ML tree under the constraints of (1, 2), (4, 5), and (8, 9) was built by PAUP*, 4.0b8 (Swofford 2000Go) as the model tree. (The tree topology and branch lengths are given in fig. 1.) The internal branch lengths in this tree are less than 0.04 and some of them are close to 0, such as the internal branch between the subtree (8, 9) and taxon 10.

The F84 model (Felsenstein and Churchill 1996Go) was used for simulation and analysis with base frequencies as ({pi}a, {pi}c, {pi}g, {pi}t) = (0.37, 0.24, 0.12, 0.27) and transition/transversion ratio ({kappa}) as 2.93. These values are taken from Zwickl and Hillis (2002)Go, which were estimated by ML on a tree obtained by a parsimony search for 2 of the genes present in the Murphy et al. (2001)Go data set (12S rRNA and cnr 1, a protein-coding gene). To avoid confounding the issues, we did not consider heterogeneous site rates. The sequence length is fixed as 500 for simulation.

Methods
All the likelihoods were calculated by PAUP*, 4.0b8. After the sitewise log-likelihoods were obtained, the AU test was performed using CONSEL (Shimodaira and Hasegawa 2001Go). We used the default settings of CONSEL with the scales in the bootstrap ranging from 0.5 to 1.4, in an increment of 0.1 and with 10,000 bootstraps for each scale. The GLS test was performed using the program written by Susko (2003)Go. The distribution of the test statistics in the SSOWH and SDNB tests were built based on 100 bootstrap replicates. It means that the zero P values were replaced by 0.005 for the SSOWH and SDNB test when used with the chi-square approximation procedure.

After obtaining the P values, statements were made on the overall hypothesis through the multiple test procedures. For the true constraint, the proportion of rejection can be viewed as the probability of type I error; otherwise it indicates the power. The levels of q* in the FDR-controlling procedure and the significance level {alpha} in the chi-square approximation procedure were both chosen as 0.05.

Results
The number of rejections of the constraint tests on 100 simulated data sets are listed in table 1. Our goal is to find the combination of the tree test method and the multiple test procedure, which has strong power and controlled type I error. In table 1, the results are given in 2 parts: true constraints in the upper part and false ones in the lower part. The first column shows the constraints being tested. The SDNB, AU, SSOWH, and GLS tests are examined on each constraint. The number of individual hypotheses is 28 for n0 = 2, 63 for n0 = 3, 90 for n0 = 4, and 100 for n0 = 5. For each constraint test, we applied the FDR-controlling procedure and the chi-square approximation method, denoted respectively by FDR and {chi}2. Because the multiple comparison methods are based on P values, the characters of the test methods are crucial in making conclusions on H0.


View this table:
[in this window]
[in a new window]

 
Table 1 The Rejection Times out of 100 Based on the 4 Test Methods and 2 Multiple P Value Procedures. The Upper Part of the Table Corresponds to the True Constraints, Whereas the Lower Part of the Table the False Constraints

 
As the most powerful test method, the SDNB did not control the type I error well in the case of constraint (8,9). It is sensitive to the star topology and tends to reject more than it should. The short internal branch (0.0036) between subtree (8,9) and taxon 10 creates an analogous star topology, leading to large type I error in the SDNB test. Table 2 shows the number of rejections out of 100 under the same constraint with different internal branch lengths between the subtree (8,9) and taxon 10 under the SDNB + FDR combination. Obviously, in the range of 0.0036 and 0.036, the longer the internal branch is, the smaller the type I error is.


View this table:
[in this window]
[in a new window]

 
Table 2 The Simulation Result of the SDNB Test and the FDR-Controlling Procedure with the Internal Branch between Subtree (8,9) and Taxon 10 Increasing

 
The SSOWH test has the highest power among the remaining methods. Although the SSOWH test uses the unrealistic underlying assumption in the parametric bootstrap, which is that all the possible tree topologies equally support the observations, it did not cause serious problems for 4-taxon trees. However, with its low type I error and high power, the price to pay for the SSOWH test is the intensive computations for constructing the distribution of the test statistic, especially so when the number of individual tests and the bootstraps are large.

Although the 2 multiple comparison methods did not perform differently with the SDNB and SSOWH tests, the chi-square approximation method outperformed the FDR procedure on power when combined with the GLS and AU test. However, the combination of the GLS test and the chi-square approach procedure did not control type I error, which was always larger than 0.1 and went up to 0.37 under the true constraint. The possible reason of this failure is due to the stronger correlations between the P values resulting from the GLS test. Because all the taxa appear more than once in composing these 4-taxon trees, the independence assumption is violated in all tests. But the dependence of P values from other tree test methods is much weaker than that resulting from GLS.

By combining with the FDR-controlling procedure, the type I error based on the GLS test was controlled along with similar power to the AU test. However, the GLS tests provided weaker power for the constraint of (1,2,3,6,7), whereas the performance of the AU test is comparable with the SSOWH and SDNB tests. There are 5 taxa in this constraint, which means that the P values were compared with values as small as 0.05/100 = 0.0005 with increment of 0.0005 in the FDR procedure. The chi-square distribution is an approximation in the GLS test, with its accuracy depending on the sequence lengths. It seems that a sequence length of 500 could not give enough accuracy for 5-taxon case, and this accuracy is hard to improve.

We also found that the power changes under different false constraints with 2 taxa. When there are more than one taxon that need to be removed to let the 2 taxa in the constraint to be neighbors, it is easy to reject H0. For example, it can be rejected almost 100 out of 100 times for the null hypothesis that taxa 7 and 8 are neighbors because taxa 9 and 10 need to be taken out to make taxa 7 and 8 neighbors in the model tree. This is partially the reason for the relatively low power for the constraint tests of (8,10) and (9,10).

In summary, if the number of individual hypotheses is not too large and it is possible to implement the SSOWH test, this method will provide high power and low type I error. For large number of individual hypothesis, which often is the case, the AU test is recommended for the constraint test, where we can easily increase the number of bootstraps in the AU test to have more accurate P values. The FDR-controlling procedure is a safe choice, which works well along with the SSOWH and AU tests in any conditions. The chi-square approximation method is powerful, but the zero P value issue must be dealt with. The effect on the overall statement of using the midpoints between 0 and smallest possible P values to replace zero P values is not clear, and it is difficult to avoid zero P values without dramatically increasing the number of bootstrap replicates.


    Real Data Analysis
 TOP
 Abstract
 Introduction
 The Rationale of the...
 Tree Test Methods for...
 Procedure
 Multiple P Values
 Simulation
 Real Data Analysis
 Discussion
 Acknowledgements
 References
 
We implemented the procedure of the constraint test on the data set with 46 taxa and 443 amino acid positions in Andersson and Roger (2002)Go. From their analysis, we know that with the exception of 5 eukaryotic sequences, Dictyostelium discoideum (Dd), Trypanosoma brucei (Tb), Leishmania major (Lm), Trichomonas vaginalis (Tv), and Giardia lamblia (Gl), the eukaryote's group are divided into 2 distinct clades: "plant + protist" (PP) clade with 10 taxa and "animal + fungal" (AF) clade with 8 taxa in the ML tree. The question is whether these 5 exceptions are accidentally outside of the eukaryotes groups due to error.

Except for these 5 taxa, there are 41 taxa remaining, which are divided into 3 groups: PP clade with 10 taxa, AF clade with 8 taxa, and the residual 46 – 5 – 8 – 10 = 23 taxa. To address the above question, 10 constraint tests are formed, which are that Dd + PP, Dd + AF, Tb + PP, Tb + AF, Lm + PP, Lm + AF, Tv + PP, Tv + AF, Gl + PP, or Gl + AF are in the same clade. Note that these hypotheses are slightly different from the null hypothesis (1) in The Rationale of the Test Procedure. Each of these hypotheses tests that only 1 taxon is attached to a group of taxa, which is different from testing if all of them are in the same clade. We illustrate this point using the first test as an example.

The overall hypothesis of Dd and PP clade is

Formula
In other words, Dd is closer to the taxa within the PP clade than the taxa in the residual clade under H0. An assumption we have to make in the above test is that PP clade is true relative to the residual clade, which means none of the taxa have been misplaced between these 2 clades. Without this restriction, we cannot draw the conclusion of the overall hypothesis if it is rejected. The rejection may be caused by misplacement of Dd itself or by any other misplaced taxa. To decrease the computation time by using smaller number of individual tests and also increase the accuracy, the taxa within the AF clade are not used when tests involve the PP clade. Only when both hypotheses (Dd + PP and Dd + AF) are accepted, we then test which clade (PP or AF) Dd belongs to. With AF clade included, there are Formula individual 4-taxa tests; this number is decreased to Formula with AF clade excluded. Because H0 only tests the relation of Dd with PP clade and the residual clade, it suffices to test the 4 taxa trees that are formed by Dd, 1 taxon from PP clade, and 2 taxa from the residual clade. This further decreases the total number of individual tests to Formula Our gain in accuracy of the test is substantial: the P values should be accurate to 0.05/2,530 = 2x10–5 in order to compare with {alpha}/m in the multiple comparison methods, which means that we at least need roughly 50,000 replicates for each individual test in the SSOWH and AU tests.

However, even after the above treatment, the SSOWH test is still not executable in this case. The process will be extremely time consuming even for only one test instead of 2,530. Thus, the AU test is the only choice we have for this example. In order to increase the accuracy of the AU test, instead of using the default setting of CONSEL, we used the setting with the scales in bootstrap from 0.5 to 1.5 with an increment of 0.1 and 100,000 bootstraps for each scale.

Based on FDR-controlling procedure, the number of rejections of the individual tests for these 10 constraint tests are listed in table 3. The number of rejections for the first row in table 3 is out of 2,530 and for the second row is out of 2,024. Note that we reject the overall hypothesis if there is at least one individual hypothesis rejected. Based on the results, we can conclude that Dd is in the AF clade and Tb and Lm are in the PP clade. Both Tv and Gl are rejected to be in either clade of PP and AF with more rejection times for the AF clade. The number of rejections of Gl + PP and Tv + PP are not large, only 2 and 18, respectively. Because all the rejected tests have zero P values, we enlarged the number of bootstraps to 1,000,000 for these 2 cases but the results do not change. Note the results of the 10 constraints in chi-square approximation method are consistent with the FDR-controlling procedure after replacing the zero P values by 1/(2 x 1,000,000) = 5 x 10–7. There are 2 possible reasons if the 2 hypotheses are falsely rejected. One is that the output P values in CONSEL only have 3 decimals. It is possible that the P values are treated as 0, if they are less than 0.0005. The other is that even though the method controls FWER at significance level 0.05, there are still some chances to get false rejections. Note that the number of bootstraps in the AU test may lead to serious problem if it is not large enough. With the default setting of CONSEL, the constraints Tb + PP and Lm + PP are rejected because of a single rejected test with P value being zero.


View this table:
[in this window]
[in a new window]

 
Table 3 The Number of Rejections of the Individual Hypotheses for the 10 Constraints Based on Combination of the AU Test and the FDR-Controlling Procedure

 
If Gl + PP is a false constraint, the 2 rejected topology ((Gl, Porphyra yezoensis 1) (Shewanella putrefaciens, Chlamydia pneumoniae)) and ((Gl, Porphyra yezoensis 1), (Shewanella putrefaciens, Synechococcus sp. WH8102)) can tell us some information about the true structure. The details of the rejected trees in Tv + PP test will not be given here. We believe that even if these 2 constraints are not true, these 2 species must still be very close to the PP clade.


    Discussion
 TOP
 Abstract
 Introduction
 The Rationale of the...
 Tree Test Methods for...
 Procedure
 Multiple P Values
 Simulation
 Real Data Analysis
 Discussion
 Acknowledgements
 References
 
We developed the constraint test based on the ordinary tree tests. It has 2 features. The first is that only 4-taxon trees are tested where the assumption of the star tree as the boundary is not invalid anymore and the requirement of long sequence is not critical. The second is the large number of individual hypotheses, which require fast computation and accurate P values. Although the AU test is a conservative choice as an ordinary test, it fits well in these 2 aspects and provides strong power with reasonable rate of type I error. By applying our method on the data set of the example in Andersson and Roger (2000), a clear picture about the misplacement of taxa in the ML tree was reached.

Through simulation studies, we found the AU and SSOWH tree test methods combined with either of FDR or chi-square approximation are performing well. The computation load of AU test is much smaller than that of SSOWH test. To avoid dealing with the independence assumption of P values and zero P value issues in the chi-square approximation method, we prefer FDR method more than the chi-square approximation method even though the type I error and power of these 2 multiple test methods are comparable. With such high power and low type I error for methods developed in this paper, it is almost straightforward to further develop a general diagnostic tool for the local structure of the phylogenetic topology when a large topology is constructed by any other methods. It is well known that searching of a large tree topology almost always results in a local optimum. Thus, such a local diagnostic method will be very useful in complementing the tree search method. Furthermore, this method can also be used in a heuristic bottom-up search of the tree topology such that at each step we only combine 2 groups of taxa if the test is not rejected. These thoughts will serve as our future research topics.


    Acknowledgements
 TOP
 Abstract
 Introduction
 The Rationale of the...
 Tree Test Methods for...
 Procedure
 Multiple P Values
 Simulation
 Real Data Analysis
 Discussion
 Acknowledgements
 References
 
We were supported by Genome Atlantic and National Sciences and Engineering Research Council of Canada. The authors are grateful to A. Roger for providing the data as a real data analysis example and for all the discussions. We also acknowledge the helpful comments of the referees.


    Footnotes
 
Arndt von Haesler, Associate Editor


    References
 TOP
 Abstract
 Introduction
 The Rationale of the...
 Tree Test Methods for...
 Procedure
 Multiple P Values
 Simulation
 Real Data Analysis
 Discussion
 Acknowledgements
 References
 

    Andersson J, Roger A. 2002. A cyanobacterial gene in nonphotosynthetic protists—an early chloroplast acquisition in eukaryotes? Curr Biol 12:115–9.[CrossRef][ISI][Medline]

    Antenaza M. 2003. When being ‘most likely’ is not enough: examining the performance of three uses of the parametric bootstrap in phylogenetics. J Mol Evol 56:198–222.[CrossRef][ISI][Medline]

    Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc 57:289–300.

    Benjamini Y, Liu W. 1999. A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence. J Stat Plan Inference 82:163–70.[CrossRef]

    Benjamini Y, Yekutieli D. 2001. The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–88.[CrossRef]

    Dudoit S, Shaffer JP, Boldrick JC. 2003. Multiple hypothesis testing in microarray experiments. Stat Sci 18:71–103.[CrossRef][ISI]

    Efron B, Halloran E, Holmes S. 1996. Bootstrap confidence levels for phylogenetic trees. Proc Natl Acad Sci USA 93:13429–34.[Abstract/Free Full Text]

    Efron B. and R. Tibshirani. 1998. The problems of regions. Ann Statist 26:1687–1718.[CrossRef]

    Felsenstein J. 1985. Conference limits on phylogenies: An approach using the bootstrap. Evol 39:783–791.

    Felsenstein J, Churchill G. 1996. A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol 13:93–104.[Abstract]

    Hochberg Y. 1988. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75:800–3.[Abstract/Free Full Text]

    Murphy W, Eizirik E, Johnson W, Zhang Y, Ryder O, O'Brien S. 2001. Molecular phylogenetics and the origins of placental mammals. Nature 409:614–8.[CrossRef][Medline]

    Rambaut A, Grassly N. 1997. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci 13:235–8.[Abstract/Free Full Text]

    Shi X, Gu H, Susko E, Field C. 2005. The comparison of the confidence regions in phylogeny. Mol Biol Evol 22:2285–96.[Abstract/Free Full Text]

    Shimodaira H. 2002. An approximately unbiased test of phylogenetic tree selection. Syst Biol 51:492–508.[CrossRef][ISI][Medline]

    Shimodaira H, Hasegawa M. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16:1114–6.[ISI]

    Shimodaira H, Hasegawa M. 2001. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17:1246–7.[Abstract/Free Full Text]

    Susko E. 2003. Confidence regions and hypothesis tests for topologies using generalized least squares. Mol Biol Evol 20:862–8.[Abstract/Free Full Text]

    Swofford D. 2000. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Version 4.0b4a. Sunderland, MA: Sinauer Associates.

    Zwickl D, Hillis D. 2002. Increased taxon sampling greatly reduces phylogenetic error. Syst Biol 51:588–98.[CrossRef][ISI][Medline]

Accepted for publication July 18, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/10/1976    most recent
msl065v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Shi, X.
Right arrow Articles by Field, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Shi, X.
Right arrow Articles by Field, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?