MBE Advance Access originally published online on May 3, 2006
Molecular Biology and Evolution 2006 23(7):1397-1405; doi:10.1093/molbev/msl006
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
An Improved Statistical Method for Detecting Heterotachy in Nucleotide Sequences

,1
* Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium; and
Department of Plant Systems Biology, Ghent University, Ghent, Belgium
E-mail: yves.vandepeer{at}psb.ugent.be.
| Abstract |
|---|
|
|
|---|
The principle of heterotachy states that the substitution rate of sites in a gene can change through time. In this article, we propose a powerful statistical test to detect sites that evolve according to the process of heterotachy. We apply this test to an alignment of 1289 eukaryotic rRNA molecules to 1) determine how widespread the phenomenon of heterotachy is in ribosomal RNA, 2) to test whether these heterotachous sites are nonrandomly distributed, that is, linked to secondary structure features of ribosomal RNA, and 3) to determine the impact of heterotachous sites on the bootstrap support of monophyletic groupings. Our study revealed that with 21 monophyletic taxa, approximately two-thirds of the sites in the considered set of sequences is heterotachous. Although the detected heterotachous sites do not appear bound to specific structural features of the small subunit rRNA, their presence is shown to have a large beneficial influence on the bootstrap support of monophyletic groups. Using extensive testing, we show that this may not be due to heterotachy itself but merely due to the increased substitution rate at the detected heterotachous sites.
Key Words: heterotachy covarion false discovery rate bootstrap support ribosomal RNA eukaryotes
| Introduction |
|---|
|
|
|---|
It has been extensively shown that the introduction of rates across sites (RAS) models can offer vast improvements in reconstructing phylogenies (Olsen 1987
Recently introduced tests have convincingly confirmed, using real data, that the substitution rate of a site is not always constant through time (Lockhart et al. 1998
; Gu 1999
; Misof et al. 2002
) but did not validate the covarion model as a sufficient explanation of sequence evolution. This is partly because a constant percentage of covarions in the covarion hypothesis may be overly restrictive (Steel et al. 2000
). A related process, called heterotachy, that enables greater generality allows for site-specific rate variation (SSRV) regardless of the possible presence of covarions. Under such process, the evolutionary rate at a site may be different in different parts of the tree (Philippe and Lopez 2001
). Specifically, evolutionary rates may change over time for each site separately. Thus, heterotachy is a site property that allows the ratio of substitution rates on different branches of the tree to vary across sites (Lockhart and Steel 2005
; Lockhart et al. 2006
). One specific form of heterotachy, which assumes that the rate of change between substitution rates is constant over sites, can be modeled by superimposing a continuous rate switching process (Galtier 2001
) on Yang's RAS model (Yang 1996
) to allow the rate at a given site to vary over time. The model of Galtier (2001)
allows SSRV at independent sites.
Evolutionary models that describe the covarion or heterotachy hypothesis may provide a better description of the data than models that do not allow constraints to change over time (Fitch and Markowitz 1970
; Fitch 1971
; Tuffley and Steel 1998
; Huelsenbeck 2002
). Indeed, Huelsenbeck (2002)
used likelihood ratio tests to show that the covariotide model of Tuffley and Steel (1998)
provides a better explanation of evolution at several genes than a model that does not allow rates of substitution to change over time (Huelsenbeck 2002
). Further, Lockhart et al. (1996)
showed that inference of evolutionary trees under models that do not allow SSRV can be biased in the presence of covarion patterns of change. These results are suggestive of the importance of detecting heterotachous sites in an alignment because the presence of such positions might influence the choice of an evolutionary model (with/without heterotachy). Furthermore, knowing which sites evolve under time-varying rates could provide important insights into evolutionary processes.
Although tests for unveiling heterotachous sites have been proposed in the past (Lopez et al. 1999
), we argue in this article that existing tests are restrictive because 1) they may incorrectly detect many nonheterotachous sites as a result of multiple testing errors and 2) results may be highly sensitive to the number of sites in the sequence and to the number of sequences. To accommodate these problems, we have developed a new statistical test to detect heterotachy, which corrects for multiple testing by controlling the false discovery rate (FDR) (Benjamini and Hochberg 1995
; Storey and Tibshirani 2003
). We have applied this test to a large number of eukaryotic small subunit ribosomal RNA (SSU rRNA) sequences and estimated that heterotachy is present in 66% of all sites of our alignment. Identifying which sites are heterotachous is more demanding. Controlling the FDR at 5%, we could identify 29% of all sites as being heterotachous with good confidence. We have used the results to investigate whether the presence of SSRV is related to specific monophyletic groups or to secondary structure features of the SSU rRNA. Further, we have examined the impact of heterotachous sites on the bootstrap support of certain monophyletic groups. As in the study of Lockhart et al. (1998)
on covariotide substitution, we observe that the removal of heterotachous sites decreases bootstrap support under evolutionary models that do not acknowledge SSRV. We clarify the possible causes for such a decrease through extensive testing.
| Materials and Methods |
|---|
|
|
|---|
Data Collection
SSU rRNA sequences were extracted from the European ribosomal RNA database (Wuyts et al. 2004
Test of Heterotachy
To detect heterotachy at a given site, the number of substitutions for each site in each monophyletic group was predicted using a combination of neighbor-joining and maximum likelihood. First, a neighbor-joining tree was calculated using PAUP* (Swofford 2001). Based on this tree, the parameters for the general time-reversible (GTR) evolutionary model with among-site rate variation were estimated using maximum likelihood. Finally, trees were computed for each of the monophyletic groups separately using neighbor joining, using the parameters estimated by maximum likelihood. The substitution rates for each site were calculated with PAML (Yang 1997
) and used to predict substitution numbers. As such, a 21 (corresponding to 21 monophyletic groups) by 968 (length of the sequence alignment) matrix of substitution numbers was created. We use Oij, i = 1, ..., n = 21, j = 1, ..., 968, to refer to the predicted number of substitutions at site j in group i.
We define a site to be homotachous (i.e., not heterotachous) when the expected number of substitutions at that particular site in each lineage i is proportional to the overall substitution rate
i of that lineage, as measured for instance by the tree length of that lineage. To test whether a given position j is heterotachous, we propose the following chi-square test:
![]() | (1) |
![]() | (2) |
i in that group. The Test statistic 1 builds on the chi-square test of Lopez et al. (1999)
It can be shown that the modified Test statistic 1 follows a chi-square distribution with n degrees of freedom (df) in large samples. Because the asymptotic distribution of the chi-square test is unreliable with low cell values (i.e., substitution numbers), we have chosen to use permutation tests. In each of 100 000 permutations, the substitutions for each site were redistributed over the monophyletic groups in the following way: we kept the tree lengths for the different groups fixed and chose the probabilities of assigning substitutions to a monophyletic group proportional to the average rate (or the tree length) of that group (see Appendix C). The latter assures that the data are generated under homotachy. By comparing the chi-square statistics of the original data set to the chi-square statistics of each of the permutations, a P value is assigned to each site (Roff and Bentzen 1989
), indicating the degree of evidence against the presence of homotachy.
Multiple Testing Problem
Because the chi-square test 1 is used for each of the 968 sites in our alignment separately, the overall risk of false detections is high. Although Bonferroni correction can be used to control the risk of at least one false detection over all sites, it aims to control errors in the unrealistic situation where there is no heterotachy at any site. Furthermore, Bonferroni correction tends to be conservative and hence underpowered. We have therefore chosen to control the FDR (Benjamini and Hochberg 1995
), which is defined in our setting as the proportion of false results among the sites that were detected to be heterotachous.
The FDR can be controlled below 5% by rejecting the null hypothesis at all sites with q value (Storey 2003
; Storey and Tibshirani 2003
) less than 5%. The latter is the smallest FDR at which the test would still reject and, similar to the P value, expresses the amount of evidence against the null hypothesis (smaller indicating more evidence against the null hypothesis).
Assumption of Uniform P Values
The calculation of q values (Storey and Tibshirani 2003
; see Appendix B) requires knowing the proportion of truly homotachous positions (Storey 2003
). Storey and Tibshirani (2003)
propose to estimate this by calculating, for a range of
values in ]0, 1[, the observed number of P values greater than
divided by the expected number of P values greater than
under the null hypothesis. Assuming that P values are uniformly distributed under the null hypothesis, this is:
![]() | (3) |
Because cell values (i.e., substitution numbers) are low, the assumption that P values are uniformly distributed under the null hypothesis is invalid, and hence, Equation 3 is not applicable. The exact distribution of P values under the null hypothesis therefore needs to be determined. To this end, 2000 permutations of the original data set were calculated using a similar process as explained above. Next, the mean number of P values greater than
was determined for each site, with
ranging from 0 to 1 and subsequently substituted in the denominator of
Figure 1 illustrates the effect this has on the estimation of
It can be concluded from figure 1 that not adjusting for the nonuniform P values would make our method too conservative and hence underpowered. Figure 1 additionally shows that the proportion of truly heterotachous (homotachous) sites
is estimated to be 66.3% (33.7%).
|
| Results |
|---|
|
|
|---|
Test of Heterotachy
On a sequence alignment of 1289 eukaryotic SSU rRNA sequences of 968 sites, our method identified 283 (or 29.2%) heterotachous sites (i.e., sites at which the null hypothesis of homotachy is rejected and thus have a q value below 5%) while controlling the FDR at 5%. These sites were subdivided into 5 categories, from strong evidence in favor of heterotachy (q value below 2.5%) to moderate evidence (q value between 2.5% and 5%) (see table 1). The remaining sites, that is, the sites that are not rejected and thus have a q value above 5% will be labeled as homotachous sites below.
|
As reported above, the test of Lopez et al. (1999)
Phylogenetic Groups
After the identification of heterotachous sites in our data set, we determined which phylogenetic groups were primarily responsible for heterotachy at a given site. Therefore, the contribution of each monophyletic group to the chi-square statistic 1 of a site was calculated. To conclude that a given monophyletic group is responsible for heterotachy at a given site, such a contribution must be significantly elevated compared with its expectation under the null hypothesis. We therefore estimated the 95% percentile of each contribution under the null hypothesis using 10 000 permutations. Contributions exceeding this percentile were taken as evidence that heterotachy was caused by the evolutionary rate being unexpectedly high in this group (when Oij > Eij), in which case we labeled them "positive," or being unexpectedly low (when Oij < Eij), in which case we labeled them "negative." As seen in figure 2, Euglenida, Ciliophora, Platyhelminthes, and Annelida are the monophyletic groups that contribute the most to the presence of heterotachy, oftentimes due to sites evolving faster than expected.
|
FunctionStructure Analysis
In Figure 3, the heterotachous sites were mapped on the secondary structure of the yeast Saccharomyces cerevisiae. We tested the distribution of the heterotachous sites over the different regions of the secondary structure such as stems, hairpin loops, internal loops, branching loops, and single-stranded regions (bulge loops and pseudoknots were not considered because they rarely occur). Percentages of heterotachy were 27.9% in stem regions, 30.5% in hairpin loops, 34.6% in the internal loops, 29.5% in branching loops, and 27.6% in the single-stranded regions. In line with previous studies (Philippe and Lopez 2001
|
In the past, substitution rates of eukaryotic and bacterial SSU rRNAs have been superimposed on the secondary structure of S. cerevisiae (Wuyts et al. 2001
To investigate whether sites at opposing sides of a stem evolve according to the same principle (heterotachy or homotachy), 220 bp (i.e., all the nucleotide pairs within the stem regions) in the secondary structure were evaluated. One hundred and sixty five (or 75%) evolved according to the same principle (both heterotachy or both homotachy), whereas 55 (or 25%) evolved according to a different principle. To assess the significance of this result, one must acknowledge that a significant proportion
0 of pairs may evolve according to the same principle just by chance (i.e., even when heterotachous sites are randomly distributed over the alignment). With
being the probability of observing a heterotachous site within the 220 pairs (i.e., the percentage of heterotachous sites among the 440 sites that make up the 220 bp), it can be shown that this proportion equals
0 =
2 + (1
)2. For our data, we estimated
= 27%, suggesting that paired evolution happens by chance in 61% of all pairs. To assess whether the observed chance
of paired evolution
is significantly elevated, we used the delta method to acknowledge that
0 is itself estimated (see Appendix A) and found a P value of 1.3 x 106, suggesting highly significant evidence for paired evolution.
Degree of Support
Although in the absence of a molecular clock, the GTR evolutionary model used for fitting the data does not prohibit sites to have different evolutionary rates in different lineages, the combination of heterotachous and homotachous sites in an alignment may generate inaccuracies in phylogenetic inference due to the heterogeneity of the sites (Moreira and Philippe 2000
). One might therefore expect that the bootstrap support will tend to increase after removing heterotachous sites, creating a more homogeneous alignment and reducing model violations (Philippe and Germot 2000
). Due to computational constraints, a random subset of 10 sequences of each monophyletic group was used to calculate the bootstrap supports with and without the removal of heterotachous sites. Comparing these bootstrap supports for the clustering of specific monophyletic groups revealed a surprising and unexpected systematic decrease for most monophyletic groups when heterotachous sites were removed, as is shown in table 2. A similar result was observed in a covarion study by Lockhart et al. (1998)
. Likewise, using simulation studies, Penny et al. (2001)
have shown that the chance of recovering a correct tree topology increases when sites that are unchangeable for part of the time are present. Their results are therefore also suggestive of decreased bootstrap supports when removing "covarion" sites.
|
To acknowledge that this decrease in bootstrap support could be merely due to the decrease in number of available sites, regardless of the presence of heterotachy, control experiments are necessary to correctly determine the impact of the removal of heterotachous sites (Inagaki et al. 2004
...
M(N). To acknowledge simulation error due to using a finite number N of trees, we calculated an approximate 95% confidence interval (CI) for the 5% percentile as [M(L), M(U)] (Nettleton and Doerge 2000
![]() |
![]() |
x
denotes the smallest integer greater than or equal to x. We subsequently verified whether the number N of simulated alignments was sufficient to produce unambiguous results (Nettleton and Doerge 2000In some cases (i.e., for some removals of 283 random sites), some monophyletic groups could not be recovered, and hence, their bootstrap value was not obtainable from standard software. To cope with this missing information, we constructed 2 CIs as in a worst-case/best-case analysis. In the former, we imputed 0 for the missing bootstrap support. In the latter, we chose the minimal bootstrap support encountered for the given monophyletic group across all available trees (a low bootstrap value is realistic because the group could not be reconstructed, that is, bootstrap support was very low). Table 2 shows that the bootstrap supports for the different groups, after removal of heterotachy, were systematically lower than M(L). We therefore conclude that the decrease in support after removal of heterotachy is not just due to the removal of random sites. It follows that, in addition to the removal of sites, another process must be responsible for the decrease in bootstrap support.
To investigate whether the decrease in bootstrap support is the result of a different variability at heterotachous sites than others, we subsequently constructed N = 300 alignments from an alignment from which random sites with the same variability were removed from the data set. To randomly select sites with the same variability, we first subdivided sites into different classes containing at least 10 different sites of similar variability, as measured by the number of substitutions at the considered site. Next, for each heterotachous site in each variability class, we randomly drew a (homotachous or heterotachous) site from the same variability class. From the results in table 2, we conclude that the decrease in bootstrap support may well be caused by the increased variability at detected heterotachous sites (which may itself be due to the increased power of our test at sites with high variability). This is seen because the bootstrap supports are systematically higher than M(U). Note that for some groups (e.g., Platyhelminthes), there was no decrease in bootstrap support when removing the heterotachous sites. Other groups (e.g., Mollusca and Apicomplexa) may require further testing because their initial bootstrap support (when constructing the tree using all sites) was considered insufficient (i.e., <70%).
| Discussion |
|---|
|
|
|---|
Since the introduction of models that aim to incorporate the covarion hypothesis (Tuffley and Steel 1998
It is well known that the changes at the 2 sides of a stem region in RNA are correlated with each other due to the constraint of maintaining the secondary structure (Higgs 2000
). Our method to detect heterotachous sites significantly confirms such a correlation, which is suggestive of its adequate performance. We expect this similarity to be even more pronounced in reality. This is because the maximum likelihood approach used for inferring the evolutionary rates treats sites within base pairs as independent. Although this approach is known for its robustness when violating interdependencies of sites, it remains to be investigated whether results would modify if rates were inferred using base-pair models such as a 16-state Markov model (Schöniger and von Haeseler 1994
), a 7-state (Tillier and Collins 1998
), or 6-state model (Tillier and Collins 1995
). Note also that there may always be base pairs containing one heterotachous site on one side and one homotachous site on the other side of the stem. This may happen in a situation with 2 WatsonCrick pairings (A-U and G-C) and one nonWatsonCrick interaction (G-U) (Gutell et al. 1992
), as in the model of Tillier and Collins (Tillier and Collins 1995
). Indeed, it is possible that in a given monophyletic group, the transition from G-C to G-U has occurred several times but that the base pair mutates back to G-C instead of selecting the compensatory mutation to A-U. This could imply that sites at opposing sides of the stem show differences in variability, which could result (accumulated over different groups) in the detection of heterotachy instead of homotachy (or vice versa) at opposing sites of a stem region (fig. 3).
In our analysis of the secondary structure of the SSU rRNA, we found no immediate correlation of heterotachy and structurefunction for SSU rRNA. Although it has been shown that modeling of heterotachy may provide higher likelihoods in the reconstruction of phylogenetic trees (Galtier 2001
), the role of the heterotachy process in structural and functional research still remains unclear.
Future Work
Similar to other approaches for detecting heterotachy, our method detects heterotachy between different evolutionary lineages (Lopez et al. 2002
; Kolaczkowski and Thornton 2004
). However, the definition of heterotachy allows evolutionary rates to change more generally through time, that is, across the phylogeny. This means that a rate switch can occur anywhere in the phylogenetic tree and not only at the internal nodes between kingdoms or major phylogenetic clades. Our heterotachy test cannot be used to detect rate switches at a chosen branch of the tree. Adaptations that allow more flexibility will likely lead to increased power for the heterotachy test. Further, preliminary analyses have shown that the tree length of each monophyletic group is an adequate measure for the overall evolutionary rate (i.e., an adequate choice of
i) for our chi-square test. Nonetheless, other measures might prove useful for certain data sets and possibly lead to greater power for the heterotachy test. Regardless of how one measures the overall evolutionary rate
i, it will typically be based on estimated substitution numbers. It remains to be investigated how the resulting uncertainty about
i may impact our test results. As in other research studies concerning evolutionary rates, computational constraints make this currently prohibiting, however.
Recent studies have focused on modeling covarion and SSRV (i.e., heterotachy), but current evolutionary models for these processes are limiting. The covarion hypothesis is usually modeled by superimposing 2 stochastic processes: a 2-state Markov process that acts as a switch, turning sites "on" (variable) and "off" (invariable), and a standard substitution process for sites in "on"state, corresponding to an evolutionary model of choice (Tuffley and Steel 1998
; Huelsenbeck 2002
). Likewise, heterotachy has been modeled by superimposing a continuous rate-changing process onto the among-site rate variation nucleotide process (Galtier 2001
). Both models were found to provide equally well or better fits to the data in most cases, as compared with nucleotide models that do not allow SSRV. However, these models assume site-independent evolution, an assumption that contradicts the covarion hypothesis as formulated by Fitch and Markowitz (1970)
. It thus remains to be investigated how one can model the typical behavior of covarions, that is, when a mutation gets fixed at a certain position, another position becomes variable.
| Conclusion |
|---|
|
|
|---|
In this study, we have proposed a statistical method to uncover heterotachy in an alignment involving a priori identified monophyletic groups. In a data set of 1289 aligned eukaryotic SSU rRNA sequences, we estimated that heterotachy is present at 66.3% of the sites. In addition, our method identified 29.2% of the sites to be heterotachous when controlling the FDR at 5%. No evidence was found that these sites were heterogeneously distributed along the SSU rRNA; that is, there is no evidence that the secondary structure directly affects the probability for a position to be heterotachous. We showed that sites at opposing sides in the stem regions evolve similarly, as expected for stem regions within RNA. We extensively investigated the effect of heterotachous sites on the support for certain branchings within trees reconstructed using evolutionary models without SSRV. We observed that the removal of heterotachous sites leads to decreased bootstrap supports and showed that this may be explained by the increased variability at heterotachous sites.
| Appendix A |
|---|
|
|
|---|
Let Yi equal 1 in the case of heterotachy at position i (i.e., position i has a q value below 5%) and 0 in the case of homotachy (i.e., position i has a q value above 5%). Let Xi equal 1 when there is similar evolution base pair i (i.e., if both paired sites have either a q value below 5% or they both have a q value above 5%) and 0 otherwise. Using Y and X, the following estimators for
and
were constructed:
![]() |
2 + (1
)2 is defined as
![]() |
and
has the following approximate distribution:
![]() |
![]() |
![]() |
![]() |
=
0.
For our data, we find that
and that
(0; 9.47 x 104) under the null hypothesis. This gives a P value of 1.3 x 106, indicating significant evidence for paired evolution.
| Appendix B |
|---|
|
|
|---|
In this appendix, we provide the general algorithm for estimating q values from a list of P values, as it appeared in Storey and Tibshirani (2003)
- Let
be the ordered P values. This also denotes the ordering of the features in terms of their evidence against the null hypothesis.
- For a range of
, say
= 0, 0.01, 0.02, ..., 0.95, calculate
. When the P values are not uniformly distributed under the null hypothesis, one should instead use our approach from the section Assumption of Uniform P values.
- Let
be the natural cubic spline with 3 df of
on
.
- Let the estimate of
0 be
- Calculate
- For i = m1, m2, ..., 1, calculate the estimated q value for the ith most significant feature to be
| Appendix C |
|---|
|
|
|---|
Here, we provide an artificial example of how the substitutions can be redistributed over the different monophyletic groups and sites during a permutation. Table 3 contains substitutions for 4 sites (1 through 4) within 3 groups, as well as their totals.
Table 3 Original Substitutions
| ||||||||||||||||||||||||||||||||||||
To redistribute the substitutions, we proceed from left to right starting with 8 substitutions to redistribute over the 3 different groups. Because we wish to keep the tree length fixed, the first substitution has a chance of 17/29 = 59% to be assigned to group 1, 10/29 = 34% to be assigned to group 2, and 2/29 = 7% to be assigned to group 3. Generating a random number between 1 and 29 indicates to which group a first substitution should be assigned: if the number is between 1 and 17 the substitution is assigned to the first group, if it is between 18 and 27 the substitution is assigned to the second group, and if the number is 28 or 29 the substitution is assigned to the third group. After each assignment of a substitution to a group, the tree length of that group is decremented. For example, should the first substitution be assigned to the first group, the tree length of that group would be decremented to 16. Next, we proceed to the following substitution. This way, both tree lengths and the total amount of substitutions per site will remain fixed, resulting in a possible permutation as illustrated in Table 4 below.
Table 4 Permutation
| ||||||||||||||||||||||||||||||||||||
| Acknowledgements |
|---|
|
|
|---|
We would like to thank Joseph P. Bielawski, Ziheng Yang, Joseph W. Thornton, and Jan Wuyts for helpful comments. We would also like to acknowledge the Associate Editor and 3 anonymous reviewers for helpful comments and valuable suggestions which improved our manuscript.
| Footnotes |
|---|
1 Present address: Computational and Structural Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, Heidelberg, Germany
Peter Lockhart, Associate Editor
| References |
|---|
|
|
|---|
Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc B 57:289300.
Fitch WM. 1971. Rate of change of concomitantly variable codons. J Mol Evol 1:8496.[CrossRef][Medline]
Fitch WM, Markowitz E. 1970. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Genet 4:57993.[CrossRef][Web of Science][Medline]
Galtier N. 2001. Maximum-likelihood phylogenetic analysis under a covarion-like model. Mol Biol Evol 18:86673.
Gu X. 1999. Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol 16:166474.[Abstract]
Gutell RR, Power A, Hertz GZ, Putz EJ, Stormo GD. 1992. Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acids Res 20:578595.
Higgs PG. 2000. RNA secondary structure: physical and computational aspects. Q Rev Biophys 33:199253.[CrossRef][Web of Science][Medline]
Huelsenbeck JP. 2002. Testing a covariotide model of DNA substitution. Mol Biol Evol 19:698707.
Inagaki Y, Susko E, Fast NM, Roger AJ. 2004. Covarion shifts cause a long-branch attraction artifact that unites Microsporidia and archaebacteria in EF-1
phylogenies. Mol Biol Evol 21:13409.
Kolaczkowski B, Thornton JW. 2004. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431:9804.[CrossRef][Medline]
Lockhart P, Novis P, Milligan BG, Riden J, Rambaut A, Larkum T. 2006. Heterotachy and tree building: a case study with plastids and eubacteria. Mol Biol Evol 23:405.
Lockhart P, Steel M. 2005. A tale of two processes. Syst Biol 54:94851.
Lockhart PJ, Larkum AWD, Steel MA, Waddell PJ, Penny D. 1996. Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. Proc Natl Acad Sci USA 93:19304.
Lockhart PJ, Steel MA, Barbrook AC, Huson DH, Charleston MA, Howe CJ. 1998. A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol Biol Evol 15:11838.[Abstract]
Lopez P, Casane D, Philippe H. 2002. Heterotachy, an important process in protein evolution. Mol Biol Evol 19:17.
Lopez P, Forterre P, Philippe H. 1999. The root of the tree of life in the light of the covarion model. J Mol Evol 49:496508.[CrossRef][Web of Science][Medline]
Misof B, Anderson CL, Buckley TR, Erpenbeck D, Rickert A, Misof K. 2002. An empirical analysis of mt 16S rRNA covarion-like evolution in insects: site-specific rate variation is clustered and frequently detected. J Mol Evol 55:4609.[CrossRef][Web of Science][Medline]
Moreira D, Philippe H. 2000. Molecular phylogeny: pitfalls and progress. Int Microbiol 3:916.[Medline]
Nettleton D, Doerge RW. 2000. Accounting for variability in the use of permutation testing to detect quantitative trait loci. Biometrics 56:528.[CrossRef][Medline]
Olsen GJ. 1987. Earliest phylogenetic branchings: comparing rRNA-based evolutionary trees inferred with various techniques. Cold Spring Harbor Symp Quant Biol 52:82537.
Penny D, McComish BJ, Charleston MA, Hendy MD. 2001. Mathematical elegance with biochemical realism: the covarion model of molecular evolution. J Mol Evol 53:71123.[CrossRef][Web of Science][Medline]
Philippe H, Germot A. 2000. Phylogeny of eukaryotes based on ribosomal RNA: long-branch attraction and models of sequence evolution. Mol Biol Evol 17:8304.
Philippe H, Lopez P. 2001. On the conservation of protein sequences in evolution. Trends Biochem Sci 26:4146.[Web of Science][Medline]
Roff DA, Bentzen P. 1989. The statistical analysis of mitochondrial DNA polymorphisms:
2 and the problem of small samples. Mol Biol Evol 6:53945.[Abstract]
Savill NJ, Hoyle DC, Higgs PG. 2001. RNA sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum-likelihood methods. Genetics 157:399411.
Schöniger M, von Haeseler A. 1994. A stochastic model for the evolution of autocorrelated DNA sequences. Mol Phylogenet Evol 3:2407.[CrossRef][Medline]
Spencer M, Susko E, Roger AJ. 2005. Likelihood, parsimony and heterogeneous evolution. Mol Biol Evol 22:11614.
Steel M. 2005. Should phylogenetic models be trying to fit an elephant? Trends Genet 21:3079.[CrossRef][Web of Science][Medline]
Steel M, Huson D, Lockhart PJ. 2000. Invariable sites models and their use in phylogeny reconstruction. Syst Biol 49:22532.[CrossRef][Web of Science][Medline]
Storey JD. 2003. The positive false discovery rate: a Bayesian interpretation and the q-value. Ann Stat 31:201335.[CrossRef]
Storey JD, Tibshirani R. 2003. Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100:94405.
Swofford DL. 2001. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4.0b10. Sunderland, MA: Sinauer Associates.
Tillier ERM, Collins RA. 1995. Neighbour joining and maximum likelihood with rna sequences: addressing the interdependence of sites. Mol Biol Evol 12:715.[Web of Science]
Tillier ERM, Collins RA. 1998. High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal RNA. Genetics 148:19932002.
Tuffley C, Steel MA. 1998. Modelling the covarion hypothesis of nucleotide substitution. Math Biosci 147:6391.[CrossRef][Web of Science][Medline]
Van de Peer Y, Ben Ali A, Meyer A. 2000. Microsporidia: accumulating molecular evidence that a group of amitochondriate and suspectedly primitive eukaryotes are just curious fungi. Gene 246:18.[CrossRef][Web of Science][Medline]
Van de Peer Y, Chapelle S, DeWachter R. 1996. A quantitative map of nucleotide substitution rates in bacterial rRNA. Nucleic Acids Res 24:338191.
Van de Peer Y, Rensing SA, Maier U-G, De Wachter R. 1996. Substitution rate calibration of small subunit ribosomal RNA identifies chlorarachniophyte endosymbionts as remnants of green algae. Proc Natl Acad Sci USA 93:77326.
Wasserman L. 2004. All of statistics: a concise course in statistical inference. New York: Springer.
Wuyts J, Perrière G, Van de Peer Y. 2004. The European ribosomal RNA database. Nucleic Acids Res 32:D1013.
Wuyts J, Van de Peer Y, De Wachter R. 2001. Distribution of substitution rates and location of insertion sites in the tertiary structure of ribosomal RNA. Nucleic Acids Res 29:501728.
Yang Z. 1996. Among-site rate variation and its impact on phylogenetic analysis. Trends Ecol Evol 11:36770.[CrossRef]
Yang Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13:5556. Available from: http://abacus.gene.ucl.ac.uk/software/paml.html.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. V. Edwards Natural selection and phylogenetic analysis PNAS, June 2, 2009; 106(22): 8799 - 8800. [Full Text] [PDF] |
||||
![]() |
N. Gruenheit, P. J. Lockhart, M. Steel, and W. Martin Difficulties in Testing for Covarion-Like Properties of Sequences under the Confounding Influence of Changing Proportions of Variable Sites Mol. Biol. Evol., July 1, 2008; 25(7): 1512 - 1520. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Kolaczkowski and J. W. Thornton A Mixed Branch Length Model of Heterotachy Improves Phylogenetic Accuracy Mol. Biol. Evol., June 1, 2008; 25(6): 1054 - 1066. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||















