Molecular Biology and Evolution 18:1024-1033 (2001)
© 2001 Society for Molecular Biology and Evolution
ARTICLE |
Effects of Nucleotide Composition Bias on the Success of the Parsimony Criterion in Phylogenetic Inference
Department of Biology, University of New Mexico
Department of Ecology and Evolutionary Biology, University of Connecticut
| Abstract |
|---|
|
|
|---|
Convergence in nucleotide composition (CNC) in unrelated lineages is a factor potentially affecting the performance of most phylogeny reconstruction methods. Such convergence has deleterious effects because unrelated lineages show similarities due to similar nucleotide compositions and not shared histories. While some methods (such as the LogDet/paralinear distance measure) avoid this pitfall, the amount of convergence in nucleotide composition necessary to deceive other phylogenetic methods has never been quantified. We examined analytically the relationship between convergence in nucleotide composition and the consistency of parsimony as a phylogenetic estimator for four taxa. Our results show that rather extreme amounts of convergence are necessary before parsimony begins to prefer the incorrect tree. Ancillary observations are that (for unweighted Fitch parsimony) transition/transversion bias contributes to the impact of CNC and, for a given amount of CNC and fixed branch lengths, data sets exhibiting substantial site-to-site rate heterogeneity present fewer difficulties than data sets in which rates are homogeneous. We conclude by reexamining a data set originally used to illustrate the problems caused by CNC. Using simulations, we show that in this case the convergence in nucleotide composition alone is insufficient to cause any commonly used methods to fail, and accounting for other evolutionary factors (such as site-to-site rate heterogeneity) can give a correct inference without accounting for CNC.
| Introduction |
|---|
|
|
|---|
Since phylogenetic relationships cannot be observed, it is impossible to directly verify the accuracy of phylogeny reconstructions. Because of this difficulty, it is of interest to discover conditions in data that can be demonstrated to cause phylogeny reconstruction methods to fail. One approach has been to specify a model phylogeny and a substitution model incorporating the factor of interest and then show that data generated from that phylogeny result in incorrectly inferred relationships. This demonstration can be done analytically for simple cases and some phylogeny reconstruction methods (e.g., Felsenstein 1978
For DNA sequence data, several evolutionary factors have been discovered that can potentially mislead phylogeny estimation methods. Examples of such factors include transition/transversion bias (Kimura 1980
; Wakeley 1993
), heterogeneity in substitution rates among lineages (Felsenstein 1978
), heterogeneity in substitution rates among sites within a nucleotide sequence (Navidi, Churchill, and von Haeseler 1991
; Reeves 1992
; Sidow and Steel 1992
; Yang 1993
), nonindependence of sites within a gene (Goldman and Yang 1994
; Muse 1995, 1996
; Schöniger and von Haeseler 1995
), and nonstationarity of nucleotide frequencies across lineages (Loomis and Smith 1990
; Burggraf, Stetter, and Woese 1
992; Hasegawa and Hashimoto 1993
; Lockhart et al. 1994
; Galtier and Gouy 1995, 1998
).
Lockhart et al. (1994)
presented three compelling examples in which they postulated that convergence in nucleotide composition (CNC) in independent lineages led parsimony, as well as methods based on traditional substitution models, to prefer an incorrect tree, namely the tree placing taxa with similar nucleotide compositions together. LogDet (Lake 1994
; Steel 1994
) was the only transformation of those tested that resulted in a correct phylogenetic inference. Relatively few other cases have been found in which CNC has been identified as a problematic factor, although Foster and Hickey (1999)
suggest that it may be the cause of misleading inferences for animal phylogenies when using all mitochondrial protein-coding sequences. There are at least two plausible explanations for this paucity of examples. First, if changed nucleotide composition is inherited (fig. 1A
) rather than acquired by convergence (fig. 1B
), one might expect phylogeny methods such as parsimony to prefer the correct tree more strongly than they should. Thus, whether nonstationarity in nucleotide composition is a problem would depend on the relative frequency in nature of inherited versus convergent similarity in nucleotide composition. This explanation is rather difficult to investigate, as it requires ascertaining relative frequencies of inherited composition versus CNC in nature. Second, even if convergent similarity in nucleotide composition is common, whether it is a problem for phylogeny methods depends on the strength of the convergence and how CNC interacts with other evolutionary factors. In this paper, we instead concentrate on this second explanation, using analyses of four-taxon phylogenies to obtain a feeling for the amount of CNC required to mislead phylogeny methods, especially parsimony. We also present a reexamination of one of the Lockhart et al. (1994)
examples using computer simulation to show that other factors are at work in these data, and CNC alone does not provide a satisfactory explanation for the failure of the phylogeny methods examined.
|
| Convergence in Nucleotide Composition in Four-Taxon Trees |
|---|
|
|
|---|
The term "nucleotide composition" can have at least two distinct meanings. It can refer to the nucleotide pool available for substitution or to the observed proportions of nucleotides in a particular sequence or genome. Both have been termed "equilibrium frequencies," since all commonly used substitution models (with the exception of the model underlying the LogDet/paralinear distance measure) assume that the nucleotide composition is stationary (i.e., does not change from lineage to lineage across the tree). We use the term "base frequencies" to refer to the substitution pool relative frequencies, but we allow them to change from lineage to lineage following Yang and Roberts (1995)
In this section, we examine the question of how much CNC is required to mislead parsimony in the four-taxon case by using the probabilities of parsimony-informative patterns to define the region of statistical inconsistency for parsimony (i.e., the region in which parsimony would converge on an incorrect tree given an infinite amount of data). The model tree is that in figure 1B,
consisting of two "biased" branches and three "unbiased" branches (the central branch comprises both segments attached to the root node). Because short internal branches in four-taxon trees present the greatest difficulties for phylogeny reconstruction, the length of the central branch was varied independently of the four peripheral branches. Branch lengths are given in terms of the expected number of substitutions per site (d) unless otherwise indicated. The K2P model (Kimura 1980
) was used for unbiased branches, and the model employed for biased branches was the T92 model (Tamura 1992
; Galtier and Gouy 1998
). The bias introduced along the two biased branches involved increasing the frequency of both G and C by an amount
(i.e.,
G =
C = 0.25 +
,
A =
T = 0.25 -
). The probability of observing any of the four bases at the root node was assumed to be 0.25, in accordance with the K2P model employed for the central branch containing the root.
With a tree and a substitution model thus specified, it is possible to compute the probability of all 256 data patterns for any given combination of G+C bias (
), transition/transversion rate ratio (
), and branch length (d). We need be concerned with only 36 of the 256 possible patterns, 12 of which support each of the three possible unrooted trees. Let P0 be the sum of the probabilities of the 12 patterns supporting the true tree and let P1 and P2 be the sum of the probabilities of the 12 patterns supporting each of the two incorrect trees. If either P1 or P2 exceeds P0, then parsimony will tend to choose incorrectly even with an infinite number of nucleotide sites (i.e., parsimony is statistically inconsistent).
As expected, for many combinations of branch lengths and
, increasing G+C bias (
) caused parsimony to become statistically inconsistent (fig. 2
). Since the model tree specified the biased branches to be those leading to sequences 1 and 3, the tree that placed sequences 1 and 3 (tree 1) together was increasingly supported as the level of bias increased. Tree 0 (the true tree, placing sequences 1 and 2 together) and tree 1 thus provided the comparison of interest; tree 2 (placing sequences 1 and 4 together) will be ignored hereinafter. The plots in figure 2
depict the difference between P0 and P1. The region of inconsistency (shaded) is entered when the surface representing P0 - P1 dips below 0; it is in this area that parsimony is expected to prefer tree 1 over the true tree.
|
It has been suggested by Lockhart et al. (1992)
Figure 2
shows that, in general, branch lengths must be large (>0.5 substitutions per site) for CNC to cause serious problems for parsimony, even when the G+C bias is nearly at its maximum possible value (
= 0.24). CNC is exacerbated by small internal branch lengths and especially by transition/transversion bias.
Figure 3 repeats the analysis of figure 2 , this time including the discrete gamma distribution of sitewise relative rates. In this case, we see that the addition of rate heterogeneity actually decreases the size of the zone of inconsistency, especially in regions where all branches are long. One might predict that site-to-site rate heterogeneity would make matters worse for parsimony (and any method that does not take it into account), since high rate heterogeneity implies that change is concentrated at fewer sites. This means that variable sites have a better chance of experiencing multiple hits than in the rate homogeneity case, leading to greater difficulty in distinguishing true phylogenetic signal from false signal due to convergence. This would be especially true if the total amount of accumulated nucleotide composition bias were held constant. In figure 2 , this is not the case: it is the number of substitutions (branch lengths) that is held constant, and the greater success of parsimony can thus be attributed to the fact that change has been concentrated at a few variable sites, and the realized nucleotide composition bias is not as great as that for the rate homogeneity case (where more sites have undergone at least one change).
|
| Simulation Study |
|---|
|
|
|---|
The rigidity of the model tree in the analytical study makes it difficult to apply the results to real data sets. In particular, few real data sets follow the assumed perfect molecular clock, and fewer still have interior nodes so evenly spaced in time. We therefore used computer simulation to study the effects of CNC on the ability of parsimony and other methods to reconstruct the true tree using the chlorop.phy data set obtained from http://imbs.massey.ac.nz/Research/MolEvol/Farside/programs.htm and described in Lockhart et al. (1994)
Using PAUP*, version 4.0d64 (Swofford 1998
), we were able to reproduce the results of Lockhart et al. (1994)
on the entire data matrix of eight sequences, but we reduced the data set to just the sequences from Anacystis, Olithodiscus, Euglena, and Chlamydomonas for simplicity. As table 1
shows, reducing the taxon sampling did not affect the general conclusions reached by Lockhart et al. (1994)
. All methods examined except LogDet favored the unrooted tree topology grouping Euglena and Olithodiscus and separating them from Chlorella and Anacystis, which have higher G+C contents (table 2
). The model described by Galtier and Gouy (1998)
, hereinafter called the GG98 model, was used to simulate data according to the tree presumed to be correct, namely, (Anacystis, Olithodiscus, (Euglena, Chlamydomonas)). In essence, the hypothesis tested was that the process underlying the evolution of the observed sequences did not differ from the model of evolution used in the simulations. The results of the previous section suggest that the degree of bias present in the Lockhart et al. (1994)
data set is not large enough to mislead parsimony (or, presumably, other methods) unless other factors exacerbate its effects. We therefore predicted that all methods would usually pick the correct tree in the simulated data sets.
|
|
The parameter values used in the simulations were maximum-likelihood estimates obtained using two independently written computer programs, each using the GG98 model. The program EVAL_NH, written by Galtier and Gouy, was used to check the results from a program (GG98) written separately by one of us (P.O.L.). It is important to note that the incorporation of CNC makes the model non-time-reversible. In such models, the maximum likelihood changes with different rootings, so table 3 presents likelihood scores for all 15 possible rooted topologies for four taxa. The maximum-likelihood tree under the GG98 model is the "true" tree (table 3 ). This result demonstrates that using a model allowing nucleotide composition to vary across the tree improves the quality of the estimated tree. The two programs were in agreement with respect to the parameter estimates for the maximum-likelihood tree (fig. 4 ). We each wrote independent computer programs to simulate data sets based on these parameter estimates and used PAUP*, version 4.0d64 (Swofford 1998
|
|
We repeated the simulations, this time incorporating discrete gamma rate heterogeneity into the data. The model used is termed the GG98-
model, as it is identical to the GG98 model except for the addition of a gamma shape parameter. Four rate categories were used, with the mean of each category serving as the relative rate used in the likelihood calculations. Again, when the likelihood of each of the 15 possible rooted trees was computed using the GG98-
model, the maximum-likelihood tree was identical to the tree topology assumed to be true by Lockhart et al. (1994)
model (fig. 5
) were used as the basis of the simulations; however, this time only the GG98 program could be used to estimate parameters because EVAL_NH does not include the gamma version of the GG98 model. In this case, some of the simulated data sets resulted in incorrect estimates of phylogeny regardless of the method used. Nevertheless, all of the methods recovered the correct tree a high percentage of the time, and LogDet did not outperform the other methods (table 4
) when presented with the true amount of rate heterogeneity (the maximum-likelihood estimate of the gamma shape parameter from the original data set, 0.308, was the assumed level of rate heterogeneity in the simulated data).
|
|
| Discussion |
|---|
|
|
|---|
Of the many evolutionary factors affecting the accuracy of phylogenetic inference, CNC is a relative newcomer, being recognized formally as a problem with the papers by Lake (1994)
Few clear cases have been reported in which CNC has been thought to derail the phylogenetic inference process. Of the three cases presented by Lockhart et al. (1994)
, two involve 18S rDNA from vertebrates and COII mtDNA from honeybees. In these two data sets, we could not find any way to obtain the putative "correct" tree except by using LogDet/paralinear distances, as reported by Lockhart et al. (1994)
. It is notable, however, that it is necessary to exclude all constant and autapomorphic sites (analyzing only parsimony-informative sites) to accurately estimate the phylogeny for these data sets. This suggests site-to-site rate heterogeneity as the likely culprit; however, taking account of site-to-site rate heterogeneity using the standard methods fails to produce a correct estimate. Therefore other, as yet unidentified, factors must be at work in these data sets.
The simulation study reported here represents a test of the hypothesis that CNC alone, or CNC in combination with site-to-site rate heterogeneity, is sufficient to explain the failure of many phylogenetic methods for the third case presented by Lockhart et al. (1994)
(represented by the chlorop.phy data set). We used a parametric bootstrap approach in which parameters were estimated from the data using maximum likelihood and simulations performed using these parameter estimates. The results show that CNC, either alone or in combination with site-to-site rate heterogeneity, is insufficient to account for difficulties found in the original data set. None of the simulated data sets presented problems for parsimony or any of the other methods tested (all of which failed on the original data set).
It is clear that the GG98 model used for the simulations did not capture some factor important in the evolution of the actual sequences. One possibility is that the GG98 model does not allow enough variation in nucleotide composition across the tree. This model places some constraints on changes in nucleotide composition, forcing the frequency of G to equal the frequency of C and allowing only changes in G+C composition at the nodes of the tree. It seems unlikely that these two model constraints can account for the differences seen between the simulation results and the results from the original data. First, allowing the composition of G to differ from the composition of C should not increase the chances of an artifactual joining of Euglena to Olithodiscus, since it is the low G+C content in these lineages that is postulated to have caused problems in the original data set. Second, allowing nucleotide composition to vary within lineages should also not increase the chance of Euglena pairing with Olithodiscus, since all of the phylogenetic methods that failed on the original data set view branches as the smallest units making up a phylogenetic tree: that is, they cannot, like LogDet, take account of changes in composition that occur within branches.
When simulations incorporated both CNC and rate heterogeneity, a small fraction of the simulated data sets proved difficult for all methods. This falls short of the result that would be expected if rate heterogeneity were the all-important missing factor. Also, we would expect LogDet to perform well (as it did on the original data set) compared with the other methods examined. In fact, LogDet behaves similarly to the other methods, failing on a small fraction of the simulated data sets (table 4 ). These observations indicate the presence of as-yet-unknown evolutionary factors at work in the evolution of the actual sequences that are not being modeled by the simulations.
The phylogenetic methods in common use today each have their own "Achilles' heel," and it behooves researchers to learn as much as possible about the factors at work in their data prior to deciding on a method to use in the final analysis. For example, parsimony's primary Achilles' heel has long been identified as long-branch attraction (Felsenstein 1978
). Maximum likelihood can correct for problems that are identified and incorporated into substitution models but can be deceived by factors not represented in the model used (e.g., rate heterogeneity; Gaut and Lewis 1995
). This paper has addressed a potential Achilles' heel applicable to most methods of phylogenetic inference and found that it is perhaps not as great a threat as it was initially perceived to be. This is not to say that CNC can be ignored altogether. Figure 3
illustrates that CNC in combination with site-to-site rate heterogeneity and transition/transversion bias can cause problems even at biologically realistic substitution rates and levels of rate heterogeneity. For example, in figure 3 , one point at which parsimony is inconsistent is characterized by the following parameter values: peripheral branch lengths = 0.8, central branch length = 0.1, gamma shape = 0.2, and transition/transversion rate ratio = 1.0, with a G+C difference of 0.12 between biased and unbiased lineages. These branch lengths and the G+C bias are at the edge of what is normally observed in actual data sets, but none are out of the realm of possibility, and the transition/transversion bias and degree of rate heterogeneity are not at all extreme. LogDet/paralinear distances provide a practical means for diagnosing CNC should it be present in a dosage sufficient to cause problems. A tree estimated using LogDet that differs from trees estimated using other methods should prompt an examination of the data for evidence that other methods are incorrectly joining taxa with similar nucleotide compositions.
While it is unlikely that any data set can be found that shows the influence of one and only one evolutionary factor, it is nevertheless beneficial to thoroughly analyze sequence data sets in the search for good examples of the effects of evolutionary factors representing potential pitfalls for phylogeny methods. Equally important is the search for new evolutionary factors. It is only when such evolutionary factors as site-to-site rate heterogeneity, transition/transversion bias, evolutionary dependence among sites, and CNC are discovered that work can begin on creating evolutionary models that avoid the problems they create.
| Acknowledgements |
|---|
|
|
|---|
The authors would like to thank Peter Lockhart, Michael Steel, Michael Hendy, and David Penny for making the data sets used in their 1994 paper freely available to other researchers over the World Wide Web. Permission from David L. Swofford to use a prerelease test version of his software PAUP*, version 4.0, is also gratefully acknowledged. Finally, we thank the Biology Department of the University of New Mexico for providing support for the computing facilities needed to carry out this research. P.O.L. gratefully acknowledges funding from the Alfred P. Sloan Foundation/National Science Foundation (grant 98-4-5 ME). This paper is the culmination of the research performed for a Senior Honors Thesis by G.C.C.
| Footnotes |
|---|
Masami Hasegawa, Reviewing Editor
1 Keywords: nucleotide composition
phylogeny
LogDet
G+C bias
maximum parsimony ![]()
2 Address for correspondence and reprints: Gavin C. Conant, Department of Biology, 167 Castetter Hall, University of New Mexico, Albuquerque, New Mexico 87131-1091. gconant{at}unm.edu
. ![]()
| literature cited |
|---|
|
|
|---|
Burggraf, S. G., K. O. Stetter, C. R. Woese. 1992. A phylogenetic analysis of Aquifex pyrophilus.. Syst. Appl. Microbiol. 15:352356[ISI][Medline]
Felsenstein, J.. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27:401410
.1993. PHYLIP (phylogeny inference package)Version 3.5. Distributed by the author, Department of Genetics, University of Washington, Seattle, Washington
Foster, P. G., D. A. Hickey. 1999. Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions. J. Mol. Evol. 48:284290[ISI][Medline]
Galtier, N., M. Gouy. 1995. Inferring phylogenies from DNA sequences of unequal base compositions. Proc. Natl. Acad. Sci. USA. 92:1131711321
.1998. Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol. Biol. Evol. 15:871879
Gaut, B., P. O. Lewis. 1995. Success of maximum likelihood phylogeny inference in the four-taxon case. Mol. Biol. Evol. 12:152162[Abstract]
Goldman, N., Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725736[Abstract]
Hasegawa, M., T. Hashimoto. 1993. Ribosomal RNA trees misleading?. Nature. 361:23
Huelsenbeck, J. P.. 1995. Performance of phylogenetic methods in simulation. Syst. Biol. 44:1748
Jukes, T. H., C. R. Cantor. 1969. Evolution of protein moleculesPp. 21132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York
Kimura, M.. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111120[ISI][Medline]
Kuhner, M. K., J. Felsenstein. 1994. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11:459468[Abstract]
Lake, J. A.. 1994. Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc. Natl. Acad. Sci. USA. 91:14551459
Lockhart, P. J., D. Penny, M. D. Hendy, C. J. Howe, T. J. Beanland, A. W. D. Larkum. 1992. Controversy on chloroplast origins. FEBS Lett. 301:127131[ISI][Medline]
Lockhart, P. J., M. A. Steel, M. D. Hendy, D. Penny. 1994. Recovering evolutionary trees under a more realistic model of sequence evolution. Mol. Biol. Evol. 11:605612[ISI]
Loomis, W. F., D. W. Smith. 1990. Molecular phylogeny of Dictyostelium discoideum by protein sequence comparison. Proc. Natl. Acad. Sci. USA. 87:90939097
Muse, S. V.. 1995. Evolutionary analyses of DNA sequences subject to constraints on secondary structure. Genetics. 139:14291439[Abstract]
.1996. Estimating synonymous and nonsynonymous substitution rates. Mol. Biol. Evol. 13:105114
Navidi, W. C., G. A. Churchill, A. von Haeseler. 1991. Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. Mol. Biol. Evol. 8:128143[Abstract]
Nei, M.. 1991. Relative efficiencies of different treemaking methods for molecular dataPp. 90128 in M. M. Miyamoto and J. Cracraft, eds. Phylogenetic analysis of DNA sequences. Oxford University Press, New York
Reeves, J. H.. 1992. Heterogeneity in the substitution process of amino acid sites of proteins coded for by mitochondrial DNA. J. Mol. Evol. 35:1731[ISI][Medline]
Schöniger, M., A. von Haeseler. 1995. Performance of the maximum likelihood, neighbor joining, and maximum parsimony methods when sequence sites are not independent. Syst. Biol. 44:533547
Sidow, A., T. P. Steel. 1992. Estimating the fraction of invariable codons with a capture-recapture method. J. Mol. Evol. 35:253260[ISI][Medline]
Steel, M. A.. 1994. Recovering a tree from the leaf colourations it generates under a Markov model. Appl. Math. Lett. 7:1923
Swofford, D. L.. 1998. PAUP*: phylogenetic analysis using parsimony (*and other methods)Version 4.0 (prerelease test version). Sinauer, Sunderland, Mass
Swofford, D. L., P. J. Waddell, J. P. Huelsenbeck, P. G. Foster, P. O. Lewis, J. S. Rogers. 2001. Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methodsSyst. Biol. (in press)
Tamura, K.. 1992. Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+C-content biases. Mol. Biol. Evol. 9:678687[Abstract]
Waddell, P., M. Steel. 1997. General time-reversible distances with unequal rates across sites: mixing G and inverse Gaussian distributions with invariant sites. Mol. Phylogenet. Evol. 8:398414[ISI][Medline]
Wakeley, J.. 1993. Substitution-rate variation among sites and the estimation of transition bias. Mol. Biol. Evol. 11:426442[Abstract]
Yang, Z.. 1993. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10:13961401[Abstract]
Yang, Z., D. Roberts. 1995. On the use of nucleic acid sequences to infer early branchings in the tree of life. Mol. Biol. Evol. 12:451458[Abstract]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
V. Gowri-Shankar and M. Rattray A Reversible Jump Method for Bayesian Phylogenetic Inference with a Nonhomogeneous Substitution Model Mol. Biol. Evol., June 1, 2007; 24(6): 1286 - 1299. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. T. Herbeck, P. H. Degnan, and J. J. Wernegreen Nonhomogeneous Model of Sequence Evolution Indicates Independent Origins of Primary Endosymbionts Within the Enterobacteriales ({gamma}-Proteobacteria) Mol. Biol. Evol., March 1, 2005; 22(3): 520 - 532. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Rosenberg and S. Kumar Heterogeneity of Nucleotide Frequencies Among Evolutionary Lineages and Phylogenetic Inference Mol. Biol. Evol., April 1, 2003; 20(4): 610 - 621. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Kelchner Group II introns as phylogenetic tools: structure, function, and evolutionary constraints Am. J. Botany, October 1, 2002; 89(10): 1651 - 1669. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



(gamma shape parameter) = 0.2. A, 


