MBE Advance Access originally published online on July 20, 2006
Molecular Biology and Evolution 2006 23(10):1976-1983; doi:10.1093/molbev/msl065
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Test a Clade in Phylogenetic Trees
Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
E-mail: shi{at}mathstat.dal.ca.
| Abstract |
|---|
|
|
|---|
We develop a new method for testing a portion of a tree (called a clade) based on multiple tests of many 4-taxon trees in this paper. This is particularly useful when the phylogenetic tree constructed by other methods have a clade that is difficult to explain from a biological point of view. The statement about the test of the clade can be made through the multiple P values from these individual tests. By controlling the familywise error rate or the false discovery rate (FDR), 4 different tree test methods are evaluated through simulation methods. It shows that the combination of the approximately unbiased (AU) test and the FDR-controlling procedure provides strong power along with reasonable type I error rate and less heavy computation.
Key Words: test multiple comparison constraint clade FDR
| Introduction |
|---|
|
|
|---|
In making inferences about relationships in a phylogenetic tree, it is often of interest to know whether a subset of the species forms a clade or whether a particular species belongs to a specific clade. In the general and simpler case, the null hypothesis would be that two or more specific species form a clade in the tree. However the null hypothesis, H0 specifies neither the relationship among the remaining species out of the clade nor that within the clade when there are more than 2 species concerned, forcing us to deal with the issue of nuisance parameters in carrying out the test. In a slightly more restricted case, the null hypothesis could state that one or more specific species are members of a larger clade. In this case, we are interested in testing the membership of some but not all members of the clade.
We were motivated in this study by the example in Andersson and Roger (2002)
, where the maximum likelihood (ML) tree of 46 eubacterial and eukaryotic homologs was constructed. The eukaryote species were placed into 2 distinct clades with 5 exceptions. Andersson and Roger (2002)
were interested in testing whether any of the 5 exceptional eukaryotes could belong to 1 of the 2 eukaryote clades. In the following, we call a statement such that 1 of these 5 eukaryotes is in the same clade of 1 of the 2 eukaryotic groups a "constraint." This is the situation where we are not interested in testing the membership for all the members of one of the eukaryote but only want to test whether a particular species belongs to this clade. More details on this example are given in Real Data Analysis.
Currently, several hypothesis test methods are commonly used for inferences on trees. Most of the existing test methods are designed to test a fully fixed tree structure instead of a partially fixed structure, such as ShimodairaHasegawa (SH) test (Shimodaira and Hasegawa 1999
), the approximately unbiased (AU) test (Shimodaira 2002
), the star version of the SOWH (SSOWH) test (Antenaza 2003
), the single distribution nonparametric (or parametric) bootstrap (SDNB or SNPB) test (Shi et al. 2005
), and the generalized least squares (GLS) test (Susko 2003
). It is possible to use these tests by fixing the unspecified part of the tree at the ML values obtained under the constraint of the null hypothesis. Andersson and Roger (2000) used this approach and applied the SH and SOWH tests to test the ML trees under the constraint about the 5 exceptional eukaryotes. The P values varied extensively in the SH test, depending on which additional candidate trees were included in the group of trees tested in the null hypothesis and typically exceeded 0.75. Although as the most conservative method, the SH test did not reject any tested topologies, the SOWH test rejected all tested topologies with very small P values. Even on ignoring the variant performance of the test methods, the application of using the constrained ML tree as the tested tree may not be appropriate because the constraint in the original hypothesis is replaced by the ML tree structure under the constraint. The rejection of the hypothesis may be due to the constraint itself or the other part of the structure of the constrained ML tree. In testing that any of these 5 eukaryotes is in the same clade of 1 of the 2 eukaryotic groups, only a partially fixed structure should be tested, not the concrete tree topology.
The use of bootstrap selection probability to assign the confidence level for constraints was mentioned by Felsenstein (1985)
. The bootstrap selection probability is assessed as the "confidence" of each clade of an observed tree, based on the proportion of bootstrap trees showing the same clade. It was corrected to better agree with standard ideas of confidence levels and hypothesis testings by Efron et al. (1996)
by considering the curvature of the boundary of the trees in order to have second-order accuracy. However, the method is not widely used due to its complexity.
In this paper, we develop a simple and valid method to test the partially fixed tree structures. Our approach will be to decompose the original hypothesis H0 into a number of individual hypotheses H0i, i = 1, ..., m, in which the tested trees have fixed structures due to the constraint, where m is the total number of individual hypotheses needed. Some existing test methods can then be adopted for these hypotheses to find P values, which are calculated as the probabilities, under H0i, of observing a value of the test statistic at least as large as that observed. A conclusion about the original hypothesis can be made through dealing with these P values. A constraint can include any combination of taxa in a tree. A true constraint is a true statement about a subset of taxa that constitute a clade of the true tree, and this clade can be isolated from the other part of the structure by breaking only one internal branch. For the 10-taxon tree in figure 1, the constraint that taxa 4 and 5 are in the same clade is true. However, because we must break 2 internal branches to isolate the clade of (4, 5, 6) from the whole structure, the corresponding constraint is false. The test of a constraint about a clade is called a constraint test.
|
The rest of the paper is organized as follows. At first, the relationship between the original hypothesis of a constraint and a number of individual hypotheses is introduced. In each individual hypothesis, a 4-taxon tree is tested, which has fixed structure because 2 taxa are neighbors under the constraint. Then 4 test methods, namely, AU, GLS, SSOWH, and SDNB, are reviewed based on 4-taxon trees for these individual hypotheses. Two methods are used to cope with the multiple P values. One is the widely used false discovery rate (FDR)controlling procedure (Benjamini and Hochberg 1995
| The Rationale of the Test Procedure |
|---|
|
|
|---|
We begin by considering the case where testing the whole clade is of interest. The modifications for the more restricted case follow the development for this case. Suppose in an n-taxon tree, a clade with n0 taxa is being tested. Then the remaining n n0 taxa will form another group naturally. The overall null hypothesis can be written as:
H0: The n0 (or the remaining n n0) taxa are in the same clade. (1)
First consider the simplest meaningful case where n0 = 2, that is, the 2 taxa are neighbors under H0. In other words, the 2 taxa are closer to each other than to any others. Under the null hypothesis, these 2 taxa will be neighbors for any 4-taxon subset that includes them as members. If H0 is false, there is at least one taxon existing in between them in the true tree and thus in some of the 4-taxon trees. Inversely, if 2 taxa are neighbors in all the possible
4-taxon trees, they must be neighbors in the n-taxon tree because none of the other taxa can break the alliance between them. A simple example is given to demonstrate this idea. In figure 2, taxa 1 and 2 are neighbors in the 5-taxon tree. It implies that any 4-taxon tree including them should have them as neighbors, as shown in the second row. But the topologies in the second row can only tell that taxa 1 and 2 are neighbors in a 5-taxon tree. The topology among taxa 3, 4, and 5 is not of concern in the hypothesis.
|
The situation is more complicated when n0 > 2 due to the unknown tree structure within this clade. The idea is still that the taxa within this clade are closer to each other than to taxa outside this clade. Thus, under H0, any 4-taxon tree that includes 2 taxa within the clade and the other 2 from outside the clade has a fixed structure with the former 2 taxa as neighbors. For the simplicity of description, we call the 2 taxa chosen from the tested clade in the null hypothesis as taxa 1 and 2 in a 4-taxon tree. They are neighbors under H0, as shown in figure 3. We denote this tree structure as
1 and the other 2 possible structures of the 4 taxa as
2 and
3. There are
possible 4-taxon combinations to be tested as individual null hypotheses. Testing the original hypothesis (1) is equivalent to testing:
|
H0i: The ith 4-taxon tree has
1 as its true topology, for all i = 1, ..., m. (2) The original hypothesis (1) is accepted if all H0i (i = 1, ..., m) are accepted and rejected if at least one is rejected. Thus, testing the original hypothesis becomes a familywise error rate (FWER)controlling multiple tests. Note that when at least one taxon outside the clade breaks the alliance of the taxa within the clade, there are often more than one individual hypothesis among H0i that should be rejected. Thus, the high power of such test procedure is expected.
For the more restricted case, the main difference lies in the choice of the 4-taxon trees to be tested. For example, if we have a clade of n0 species and we are interested in testing whether one specific species belongs to the clade, we would pair the specific species with each of the remaining n0 1 species in the clade so that
The formula for m would be
if we wanted to test the membership of 2 species in a clade. Because only the value of m is changed in this more restricted case, we continue the development in the simpler case of testing the complete clade.
| Tree Test Methods for Individual Hypotheses |
|---|
|
|
|---|
Because the individual tests are the ordinary tests with a fixed tree structure under the hypothesis, the existing test methods can be applied to provide P values. Among these methods, the SH test is too conservative because the least favorable configuration assumed in the hypothesis corresponds to a specific hypothesis very difficult to reject; the Swofford-Olsen-Waddell-Hillis (SOWH) test and the single distribution parametric bootstrap (SDPB) test are acceptable only when there is no model misspecification (Shi et al. 2005
The AU test (Shimodaira 2002
) is widely used nowadays. Its hypothesis is stated in terms of the expected log-likelihood of topology
i, denoted by µi, i = 1, 2, 3. The hypothesis for testing
i is
![]() |
The proportion of times that
i is the ML tree in the bootstrap replicates is called bootstrap probability (BP), which is computed by a multiscaled nonparametric bootstrap in which different sequence lengths are used. Because it gives the first-order accurateness for the P value (Efron et al. 1996
), Shimodaira used the results from Efron and Tibshirani (1998)
to correct BP to give the second-order accurate P values, which means that the probability of the type I error of the AU test is
+ O(N3/2), where N is the sequence length (see eq. 11 and the discussion therein of Shimodaira 2002
). The AU test controls the type I error strongly with relatively weak power. As an important feature, the accuracy of the P value in the AU test can be increased by increasing the number of bootstraps with feasible computational burden.
In the GLS test, the ML pairwise distances Y contains 6 elements that can be expressed as the sums of the branch lengths on the path between the pairs of taxa. The design matrix X expressing the tree topology in figure 3 is fixed in all the individual tests. The following equation shows the relationship between Y, X, and T under H0, where T is a vector of branch lengths estimated to minimize (Y XT)TV1(Y XT), where V is the covariance matrix of the pairwise ML distances:
![]() |
Because the ML pairwise distances are asymptotically multivariate normal distributed, under H0, (Y XT)TV1(Y XT) asymptotically follows a chi-square distribution with 1 degree of freedom (Susko 2003
). By using this asymptotic result to avoid bootstrap, the GLS test has some computational advantage over the other methods, which makes it more desirable in settings with a large number of individual hypotheses.
In the SSOWH test, the test statistic is the difference of log-likelihood between the ML tree and
1, whose distribution is constructed by a parametric bootstrap. The replicate data sets are simulated under a star topology with ML branch lengths estimated from the original observations. The replicate of the test statistic is the log-likelihood difference between the ML tree and the tested tree of the replicate data set. It is shown that with large number of taxa, the SSOWH test does not perform well when the underlying true tree topology is far away from the star topology (Shi et al. 2005
). Because only 4-taxon trees are considered here, it is not a problem to treat the star topology as their boundary.
The same test statistic is used in the SDNB test. But its distribution is constructed by nonparametric bootstrap. According to the bootstrap theory, the bootstrap replicate of the test statistic is the log-likelihood difference between the ML tree of the replicate data set and the ML tree of the original data set with other free parameters estimated in the replicate data set. The SDNB test may raise problem about type I error, if there are short internal branches in the true tree.
The accuracy of the P values from the last 2 methods closely relates to the number of bootstraps. If the number of bootstrap is 100, the P value can only be accurate up to 2 decimals. The important issue about these 2 tests is the heavy computational burden to obtain high accuracy of the P values for each hypothesis and the large number of individual hypotheses.
| Procedure |
|---|
|
|
|---|
We summarize the procedure of testing all the possible 4-taxon trees under the hypothesis as below:
- Suppose there are n0 taxa within the clade under constraint. Select 2 taxa from n0 taxa inside the clade as taxa 1 and 2 and the other 2 outside the clade and combine their sequences to form a subdata set.
- Implement the chosen test method for corresponding H0i and record the P value.
- Go back to step 1 until all the 4-taxon combinations are considered. It means that we must test all of m combinations, where
There are 3 advantages in this proposed test procedure. First, the tree topology need not be estimated for any of these 4-taxon trees. As a tested topology, it has major advantage over the ML tree under the constraint. Secondly, by using 4-taxon trees in individual hypotheses, the number of individual hypotheses is minimized; that is,
for any i, j
2. It reduces the problems in multiple comparisons and avoids intensive computations to the best extent. At last, based on the simulation results in Shi et al. (2005)
, for the asymptotic results to be valid, trees with large number of taxa need long sequences in the AU and GLS tests and thus involve intensive computation. However, they work well for 4-taxon trees with relative short sequence lengths.
| Multiple P Values |
|---|
|
|
|---|
After obtaining P values of individual hypotheses, methods for making statements on overall hypothesis are needed. Let H0 = {H01, ..., H0m} be a set of null hypotheses with corresponding test statistics T1, ..., Tm and P values P1, ..., Pm, where
H0 is accepted when none of H0i, i = 1, ..., m is rejected. The FWER is the probability of rejecting at least one of them when all m null hypotheses are true. As reviewed in Dudoit et al. (2003)
The FDR-Controlling Procedure
Benjamini and Hochberg (1995)
proposed the idea to control the FDR instead of FWER in multiple tests. Consider the problem of testing simultaneously m null hypotheses with m0 of them being true. R is the number of hypotheses rejected and R0 is the number of false rejections among them. The FDR can be viewed as expectation of the proportion of false rejection of the null hypotheses:
![]() |
Define Q = 0 when R = 0, as no false rejection can be committed. In the FDR-controlling procedure, P(1), ..., P(m) are the increasingly ordered P values for H0 = {H(01), ..., H(0m)}. All H(0i) for i = 1, ..., I are rejected with I being the largest i for which
where q* is the desired level of FDR.
This procedure controls the FDR for both of independent and positively dependent test statistics (Benjamini and Liu 1999
; Benjamini and Yekutieli 2001
). If all of the null hypotheses are true, there are only 2 possible values for Q: Q = 0 when R = 0 and Q = 1 when R
1. Then the FDR is equivalent to the FWER:
![]() |
Therefore, controlling the FDR implies controlling the FWER. When the constraint is not true, the FDR-controlling procedure is more powerful than the general FWER-controlling procedure. The overall hypothesis is rejected if there is at least one H0i rejected or equivalently if I
1.
The Chi-Square Approximation Procedure
Generally, if the distribution of the test statistic is continuous, under the true null hypothesis, the P value is uniformly distributed between 0 and 1. Let Y = logP, where P denotes the P value with a uniform distribution. Then, the probability density function of Y is
![]() |
(1, 1). From the following 2 properties:- (1) Y is distributed
(
,
) if, and only if,
Y is distributed
(
, 1).
- (2) If Y1 and Y2 are independent random variables with
(
1,
) and
(
2,
) distributions, respectively, then Y = Y1 + Y2 has
(
1 +
2,
) distribution. It can be easily shown that the statistic
has a chi-square distribution with 2m degrees of freedom if the P values are independent.
- (2) If Y1 and Y2 are independent random variables with
Although the P values resulting from these 4-taxon trees may not be independent, we found their correlations are typically small in the simulation examples. Except when combined with GLS tree test method, it showed similar type I error and power as the FDR-controlling procedure when combined with other tree test methods. Note, when the bootstrap is used in calculating P values in the tree test methods (SOWH and SDPB), the P values can only be accurate up to some extent depending on the number of bootstraps. It is possible that the resultant P value is zero, which in fact means the P value is smaller than the smallest possible P value we would get from bootstraps. In such cases, theoretically it cannot be implemented because of the infinite logarithm issues in this method. However, in order to compare the result with FDR-controlling procedure, we replace the zero P value by the midpoint between 0 and the smallest possible P value from the tree tests in the following simulations. We reject the overall hypothesis if
is greater than the (1
) critical value of
2(2m).
| Simulation |
|---|
|
|
|---|
We used data simulated from a 10-taxon model tree by "Seq-Gen," version 1.2.5 (Rambaut and Grassly 1997
Models
To make the simulation more realistic, one data set was simulated from a published 66-taxon tree (Murphy et al. 2001
) with a sequence length of 3,000. Then 10 taxa were selected from the 66 taxa, and the ML tree under the constraints of (1, 2), (4, 5), and (8, 9) was built by PAUP*, 4.0b8 (Swofford 2000
) as the model tree. (The tree topology and branch lengths are given in fig. 1.) The internal branch lengths in this tree are less than 0.04 and some of them are close to 0, such as the internal branch between the subtree (8, 9) and taxon 10.
The F84 model (Felsenstein and Churchill 1996
) was used for simulation and analysis with base frequencies as (
a,
c,
g,
t) = (0.37, 0.24, 0.12, 0.27) and transition/transversion ratio (
) as 2.93. These values are taken from Zwickl and Hillis (2002)
, which were estimated by ML on a tree obtained by a parsimony search for 2 of the genes present in the Murphy et al. (2001)
data set (12S rRNA and cnr 1, a protein-coding gene). To avoid confounding the issues, we did not consider heterogeneous site rates. The sequence length is fixed as 500 for simulation.
Methods
All the likelihoods were calculated by PAUP*, 4.0b8. After the sitewise log-likelihoods were obtained, the AU test was performed using CONSEL (Shimodaira and Hasegawa 2001
). We used the default settings of CONSEL with the scales in the bootstrap ranging from 0.5 to 1.4, in an increment of 0.1 and with 10,000 bootstraps for each scale. The GLS test was performed using the program written by Susko (2003)
. The distribution of the test statistics in the SSOWH and SDNB tests were built based on 100 bootstrap replicates. It means that the zero P values were replaced by 0.005 for the SSOWH and SDNB test when used with the chi-square approximation procedure.
After obtaining the P values, statements were made on the overall hypothesis through the multiple test procedures. For the true constraint, the proportion of rejection can be viewed as the probability of type I error; otherwise it indicates the power. The levels of q* in the FDR-controlling procedure and the significance level
in the chi-square approximation procedure were both chosen as 0.05.
Results
The number of rejections of the constraint tests on 100 simulated data sets are listed in table 1. Our goal is to find the combination of the tree test method and the multiple test procedure, which has strong power and controlled type I error. In table 1, the results are given in 2 parts: true constraints in the upper part and false ones in the lower part. The first column shows the constraints being tested. The SDNB, AU, SSOWH, and GLS tests are examined on each constraint. The number of individual hypotheses is 28 for n0 = 2, 63 for n0 = 3, 90 for n0 = 4, and 100 for n0 = 5. For each constraint test, we applied the FDR-controlling procedure and the chi-square approximation method, denoted respectively by FDR and
2. Because the multiple comparison methods are based on P values, the characters of the test methods are crucial in making conclusions on H0.
|
As the most powerful test method, the SDNB did not control the type I error well in the case of constraint (8,9). It is sensitive to the star topology and tends to reject more than it should. The short internal branch (0.0036) between subtree (8,9) and taxon 10 creates an analogous star topology, leading to large type I error in the SDNB test. Table 2 shows the number of rejections out of 100 under the same constraint with different internal branch lengths between the subtree (8,9) and taxon 10 under the SDNB + FDR combination. Obviously, in the range of 0.0036 and 0.036, the longer the internal branch is, the smaller the type I error is.
|
The SSOWH test has the highest power among the remaining methods. Although the SSOWH test uses the unrealistic underlying assumption in the parametric bootstrap, which is that all the possible tree topologies equally support the observations, it did not cause serious problems for 4-taxon trees. However, with its low type I error and high power, the price to pay for the SSOWH test is the intensive computations for constructing the distribution of the test statistic, especially so when the number of individual tests and the bootstraps are large.
Although the 2 multiple comparison methods did not perform differently with the SDNB and SSOWH tests, the chi-square approximation method outperformed the FDR procedure on power when combined with the GLS and AU test. However, the combination of the GLS test and the chi-square approach procedure did not control type I error, which was always larger than 0.1 and went up to 0.37 under the true constraint. The possible reason of this failure is due to the stronger correlations between the P values resulting from the GLS test. Because all the taxa appear more than once in composing these 4-taxon trees, the independence assumption is violated in all tests. But the dependence of P values from other tree test methods is much weaker than that resulting from GLS.
By combining with the FDR-controlling procedure, the type I error based on the GLS test was controlled along with similar power to the AU test. However, the GLS tests provided weaker power for the constraint of (1,2,3,6,7), whereas the performance of the AU test is comparable with the SSOWH and SDNB tests. There are 5 taxa in this constraint, which means that the P values were compared with values as small as 0.05/100 = 0.0005 with increment of 0.0005 in the FDR procedure. The chi-square distribution is an approximation in the GLS test, with its accuracy depending on the sequence lengths. It seems that a sequence length of 500 could not give enough accuracy for 5-taxon case, and this accuracy is hard to improve.
We also found that the power changes under different false constraints with 2 taxa. When there are more than one taxon that need to be removed to let the 2 taxa in the constraint to be neighbors, it is easy to reject H0. For example, it can be rejected almost 100 out of 100 times for the null hypothesis that taxa 7 and 8 are neighbors because taxa 9 and 10 need to be taken out to make taxa 7 and 8 neighbors in the model tree. This is partially the reason for the relatively low power for the constraint tests of (8,10) and (9,10).
In summary, if the number of individual hypotheses is not too large and it is possible to implement the SSOWH test, this method will provide high power and low type I error. For large number of individual hypothesis, which often is the case, the AU test is recommended for the constraint test, where we can easily increase the number of bootstraps in the AU test to have more accurate P values. The FDR-controlling procedure is a safe choice, which works well along with the SSOWH and AU tests in any conditions. The chi-square approximation method is powerful, but the zero P value issue must be dealt with. The effect on the overall statement of using the midpoints between 0 and smallest possible P values to replace zero P values is not clear, and it is difficult to avoid zero P values without dramatically increasing the number of bootstrap replicates.
| Real Data Analysis |
|---|
|
|
|---|
We implemented the procedure of the constraint test on the data set with 46 taxa and 443 amino acid positions in Andersson and Roger (2002)
Except for these 5 taxa, there are 41 taxa remaining, which are divided into 3 groups: PP clade with 10 taxa, AF clade with 8 taxa, and the residual 46 5 8 10 = 23 taxa. To address the above question, 10 constraint tests are formed, which are that Dd + PP, Dd + AF, Tb + PP, Tb + AF, Lm + PP, Lm + AF, Tv + PP, Tv + AF, Gl + PP, or Gl + AF are in the same clade. Note that these hypotheses are slightly different from the null hypothesis (1) in The Rationale of the Test Procedure. Each of these hypotheses tests that only 1 taxon is attached to a group of taxa, which is different from testing if all of them are in the same clade. We illustrate this point using the first test as an example.
The overall hypothesis of Dd and PP clade is
![]() |
individual 4-taxa tests; this number is decreased to
with AF clade excluded. Because H0 only tests the relation of Dd with PP clade and the residual clade, it suffices to test the 4 taxa trees that are formed by Dd, 1 taxon from PP clade, and 2 taxa from the residual clade. This further decreases the total number of individual tests to
Our gain in accuracy of the test is substantial: the P values should be accurate to 0.05/2,530 = 2x105 in order to compare with
/m in the multiple comparison methods, which means that we at least need roughly 50,000 replicates for each individual test in the SSOWH and AU tests. However, even after the above treatment, the SSOWH test is still not executable in this case. The process will be extremely time consuming even for only one test instead of 2,530. Thus, the AU test is the only choice we have for this example. In order to increase the accuracy of the AU test, instead of using the default setting of CONSEL, we used the setting with the scales in bootstrap from 0.5 to 1.5 with an increment of 0.1 and 100,000 bootstraps for each scale.
Based on FDR-controlling procedure, the number of rejections of the individual tests for these 10 constraint tests are listed in table 3. The number of rejections for the first row in table 3 is out of 2,530 and for the second row is out of 2,024. Note that we reject the overall hypothesis if there is at least one individual hypothesis rejected. Based on the results, we can conclude that Dd is in the AF clade and Tb and Lm are in the PP clade. Both Tv and Gl are rejected to be in either clade of PP and AF with more rejection times for the AF clade. The number of rejections of Gl + PP and Tv + PP are not large, only 2 and 18, respectively. Because all the rejected tests have zero P values, we enlarged the number of bootstraps to 1,000,000 for these 2 cases but the results do not change. Note the results of the 10 constraints in chi-square approximation method are consistent with the FDR-controlling procedure after replacing the zero P values by 1/(2 x 1,000,000) = 5 x 107. There are 2 possible reasons if the 2 hypotheses are falsely rejected. One is that the output P values in CONSEL only have 3 decimals. It is possible that the P values are treated as 0, if they are less than 0.0005. The other is that even though the method controls FWER at significance level 0.05, there are still some chances to get false rejections. Note that the number of bootstraps in the AU test may lead to serious problem if it is not large enough. With the default setting of CONSEL, the constraints Tb + PP and Lm + PP are rejected because of a single rejected test with P value being zero.
|
If Gl + PP is a false constraint, the 2 rejected topology ((Gl, Porphyra yezoensis 1) (Shewanella putrefaciens, Chlamydia pneumoniae)) and ((Gl, Porphyra yezoensis 1), (Shewanella putrefaciens, Synechococcus sp. WH8102)) can tell us some information about the true structure. The details of the rejected trees in Tv + PP test will not be given here. We believe that even if these 2 constraints are not true, these 2 species must still be very close to the PP clade.
| Discussion |
|---|
|
|
|---|
We developed the constraint test based on the ordinary tree tests. It has 2 features. The first is that only 4-taxon trees are tested where the assumption of the star tree as the boundary is not invalid anymore and the requirement of long sequence is not critical. The second is the large number of individual hypotheses, which require fast computation and accurate P values. Although the AU test is a conservative choice as an ordinary test, it fits well in these 2 aspects and provides strong power with reasonable rate of type I error. By applying our method on the data set of the example in Andersson and Roger (2000), a clear picture about the misplacement of taxa in the ML tree was reached.
Through simulation studies, we found the AU and SSOWH tree test methods combined with either of FDR or chi-square approximation are performing well. The computation load of AU test is much smaller than that of SSOWH test. To avoid dealing with the independence assumption of P values and zero P value issues in the chi-square approximation method, we prefer FDR method more than the chi-square approximation method even though the type I error and power of these 2 multiple test methods are comparable. With such high power and low type I error for methods developed in this paper, it is almost straightforward to further develop a general diagnostic tool for the local structure of the phylogenetic topology when a large topology is constructed by any other methods. It is well known that searching of a large tree topology almost always results in a local optimum. Thus, such a local diagnostic method will be very useful in complementing the tree search method. Furthermore, this method can also be used in a heuristic bottom-up search of the tree topology such that at each step we only combine 2 groups of taxa if the test is not rejected. These thoughts will serve as our future research topics.
| Acknowledgements |
|---|
|
|
|---|
We were supported by Genome Atlantic and National Sciences and Engineering Research Council of Canada. The authors are grateful to A. Roger for providing the data as a real data analysis example and for all the discussions. We also acknowledge the helpful comments of the referees.
| Footnotes |
|---|
Arndt von Haesler, Associate Editor
| References |
|---|
|
|
|---|
Andersson J, Roger A. 2002. A cyanobacterial gene in nonphotosynthetic protistsan early chloroplast acquisition in eukaryotes? Curr Biol 12:1159.[CrossRef][ISI][Medline]
Antenaza M. 2003. When being most likely is not enough: examining the performance of three uses of the parametric bootstrap in phylogenetics. J Mol Evol 56:198222.[CrossRef][ISI][Medline]
Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc 57:289300.
Benjamini Y, Liu W. 1999. A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence. J Stat Plan Inference 82:16370.[CrossRef]
Benjamini Y, Yekutieli D. 2001. The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:116588.[CrossRef]
Dudoit S, Shaffer JP, Boldrick JC. 2003. Multiple hypothesis testing in microarray experiments. Stat Sci 18:71103.[CrossRef][ISI]
Efron B, Halloran E, Holmes S. 1996. Bootstrap confidence levels for phylogenetic trees. Proc Natl Acad Sci USA 93:1342934.
Efron B. and R. Tibshirani. 1998. The problems of regions. Ann Statist 26:16871718.[CrossRef]
Felsenstein J. 1985. Conference limits on phylogenies: An approach using the bootstrap. Evol 39:783791.
Felsenstein J, Churchill G. 1996. A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol 13:93104.[Abstract]
Hochberg Y. 1988. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75:8003.
Murphy W, Eizirik E, Johnson W, Zhang Y, Ryder O, O'Brien S. 2001. Molecular phylogenetics and the origins of placental mammals. Nature 409:6148.[CrossRef][Medline]
Rambaut A, Grassly N. 1997. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci 13:2358.
Shi X, Gu H, Susko E, Field C. 2005. The comparison of the confidence regions in phylogeny. Mol Biol Evol 22:228596.
Shimodaira H. 2002. An approximately unbiased test of phylogenetic tree selection. Syst Biol 51:492508.[CrossRef][ISI][Medline]
Shimodaira H, Hasegawa M. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16:11146.[ISI]
Shimodaira H, Hasegawa M. 2001. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17:12467.
Susko E. 2003. Confidence regions and hypothesis tests for topologies using generalized least squares. Mol Biol Evol 20:8628.
Swofford D. 2000. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Version 4.0b4a. Sunderland, MA: Sinauer Associates.
Zwickl D, Hillis D. 2002. Increased taxon sampling greatly reduces phylogenetic error. Syst Biol 51:58898.[CrossRef][ISI][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








