MBE Advance Access originally published online on August 16, 2007
Molecular Biology and Evolution 2007 24(10):2277-2285; doi:10.1093/molbev/msm160
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Statistics of the Log-Det Estimator
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
E-mail: tim.massingham{at}ebi.ac.uk.
| Abstract |
|---|
|
|
|---|
The log-det estimator is a measure of divergence (evolutionary distance) between sequences of biological characters, DNA or amino acids, for example, and has been shown to be robust to biases in composition that can cause problems for other estimators. We provide a statistical framework to construct high-accuracy confidence intervals for log-det estimates and compare the efficiency of the estimator to that of maximum likelihood using time-reversible Markov models. The log-det estimator is found to have good statistical properties under such general models.
Key Words: log-det distance pairwise distances statistical properties
| Introduction |
|---|
|
|
|---|
Many forms of the "log-det" divergence estimator have been given in the literature, including the log-det distances (Lockhart et al. 1994
It is important to assess the accuracy of any estimate of divergence, and previous papers have derived approximations of the variance of the log-det estimator, implicitly suggesting that confidence in log-det estimates could be assessed using approximations based on the normal distribution. For example, the log-det–based test for molecular clock under nonstationarity described by Gu and Li (1996
, p. 1378) is of this form: "When the sequence is long, [the statistic] follows approximately the standard normal distribution". Lockhart et al. (1994)
derived an approximation to the variance of the log-det estimator using the delta method (see also Barry and Hartigan 1987
). Gu and Li (1996)
analyzed the bias and variance of the paralinear distance, proving that log-det estimation has a small-sample bias approximately equal to twice its variance.
This paper applies the empirical likelihood framework (Owen 2001
) to the analysis of the log-det divergence estimator and shows how to construct more accurate confidence intervals (CIs), even in cases when the sampling distribution is highly nonsymmetric, and so normal-based intervals are poor approximations. The CIs constructed are accurate in the sense that their coverage error, the difference between their nominal and actual size, decreases rapidly as the number of sites increases. (Note that throughout this paper, the size of a CI is taken to mean its statistical size, i.e., the probability that the true value of a parameter lies within it, rather than its width.) Both the small- and large-sample performance of the log-det estimator are assessed, the small-sample performance by simulation and the large-sample performance by comparison of mathematical limits of the variance as the number of sites increases.
| Methods |
|---|
|
|
|---|
For the purposes of this paper, the log-det estimator analyzed is the paralinear distance (Lake 1994
![]() | (1) |
It is also useful to define the joint probability matrix as that which specifies the probability of pair (i, j) being observed at a given site. If the true joint probability matrix is J0, then the observed divergence matrix is a multinomial sample of size n with probabilities J0. The true underlying divergence, t0, is defined to be LD(J0).
Parametric and Empirical Likelihood
In a parametric likelihood context, the data are assumed to come from a parametric model and the joint probability of observing the pair (i, j) is fij(t,
), where t is the divergence and
represents additional parameters describing the substitution process. The log-likelihood for a given t and
is
|
| (2) |
If estimating
is not of interest (nuisance parameters), then they can be "removed" from the problem by treating its maximum likelihood (ML) estimate as a function of t:
|
| (3) |
being estimated from the data rather than fixed at a known value.
Rather than the parametric approach of restricting estimation to a single, albeit potentially general, family of distributions, profiling can be taken to the extreme, and we can treat the entire joint probability matrix as a nuisance parameter. Profiling in this manner is the basis of empirical likelihood (Owen 2001
). Analogously to parametric likelihood, we can define the empirical log-likelihood of any joint probability matrix J, given the observed data as
|
|
Just like a parametric likelihood, the empirical likelihood measures how probable the data are, given a specific joint probability matrix (specified implicitly by t and
in the case of the parametric likelihood). For each choice of J, there is a corresponding divergence given by LD(J) (eq. 1) and the method of empirical likelihood seeks to find the joint probability distribution for which the data are most probable, given that the underlying divergence is t. For each possible value of t, the value of the empirical likelihood function (ELF; analogous to the parametric likelihood function of eqs. 2 and 3) can be expressed as the following constrained optimization problem:
![]() | (4) |
The solution to equation 4, as a function of t, is the ELF for t. Owen (1988)
shows that this has many favorable properties similar to those of parametric likelihood. In particular, there is an empirical likelihood analog of Wilk's theorem, that is, twice the difference in empirical log-likelihood between the maximum of the ELF (attained when J equals the empirical joint probability matrix
= D/n, and so t =
) and the value when t = t0 is approximately distributed as a 
distribution. The coverage error of this approximation is proportional to n–1 (DiCiccio et al. 1991
), that is, inversely proportional to the number of aligned sites considered. Unlike parametric likelihood, this distribution holds even when the model is incorrect, although in such circumstances the meaning of estimates and CIs is open to interpretation. Other properties of empirical likelihood include good statistical power, equaling that of parametric likelihood for large-sample sizes and sometimes exceeding it for small sample sizes (Lazar and Mykland 1998
).
A Least Favorable Family Approximation
Directly solving equation 4 to determine the empirical likelihood for a given divergence requires optimization over N2 – 1 parameters subject to the stated constraints, and the structure of the log-det estimator means that usual techniques for simplifying the optimization of the empirical likelihood (Owen 2001
) are not applicable to this problem. Exact calculation of the ELF for the log-det estimator is computationally prohibitive, but DiCiccio and Romano (1990)
considered several approaches to constructing nonparametric CIs and Lee and Young (1999)
studied the properties of a particularly amenable approximation to empirical likelihood based on "Stein's Least Favorable Family" (LFF), which shares many of empirical likelihood's favorable properties. Rather than considering all possible joint probability matrices, only those that belong to a carefully chosen family are considered—this family is least favorable in the sense that estimation of parameters under the LFF is as hard (same variance) as the original problem. The empirical likelihood curves described by the LFF and by the ELF both have maxima at the empirical joint probability matrix, and their curvatures are equal about this point; hence, the 2 curves closely agree in the region important for assessing confidence in estimates.
The LFF is a family of multinomial distributions described by a single parameter,
. For a given value of
, the joint probability matrix has entries
|
| (5) |
= 0. The uij are the influence function (Hampel et al. 1986|
| (6) |
Although finding the empirical likelihood for divergence t (or its LFF approximation) involves solving t = LD(J(
)) for
, plotting the empirical likelihood curve for the least favorable family, the LFF curve, requires no optimization or root finding as it is expressed as an implicit function of
. The coordinates of the curve are given by
![]() | (7) |
Large-Sample Variance
In addition to its use in constructing the LFF, the influence function has close connections to other statistical quantities of interest (Hampel et al. 1986
). In particular, the asymptotic variance of the log-det estimator is
|
| (8) |
The Cramér-Rao Lower Bound (CRLB) places a limit on how small the variance of any unbiased estimator can be, relating it to the Fisher information for the parameters of a specified family of distributions (e.g., Pawitan 2001
). This bound applies to all estimators, irrespective of whether they estimate all parameters or just a single parameter of interest (e.g., the log-det estimator only estimates the divergence). Estimators that achieve this lower bound are said to be efficient. The ratio between the minimum possible variance and that of the estimator is its efficiency, a measure of the how well the estimator uses the observed data: an estimator with an efficiency of x requires sequences that are 1/x times longer to achieve the same variance as an efficient estimator; that is, twice as long when the efficiency is 0.5, 11% longer when the efficiency is 0.9, and so on. The large-sample performance of different estimators can be compared by considering their respective asymptotic efficiencies, which is equivalent to comparing their asymptotic variances to the CRLB.
The obvious comparison for the log-det estimator is to study its performance relative to ML using stationary Markov models of evolution, the most general of which is the "General Time-Reversible" (GTR) model (Tavaré 1986
), as time-reversible Markov models are often used for pairwise distance estimation for tree building (Gascuel 1997
; Bruno et al. 2000
) and large-data sets tend to support more complex models (Negrisolo et al. 2004
; Goremykin and Hellwig 2005
; Nikolaev et al. 2007
). The asymptotic variance of time-reversible Markov models under ML estimation is equal to the inverse Fisher information divided by the number of observations, so they have an asymptotic efficiency of 1 (Pawitan 2001
). The Fisher information for time-reversible Markov models can be calculated following Goldman (1998)
, although in that paper the Fisher information for parameters other than divergences was approximated numerically. Here it is calculated analytically using the formulas given by Schadt and Lange (2002)
.
Markov processes of sequence evolution have an arbitrary scaling factor, reflecting that rate and time are confounded: increasing the rate of substitution by a factor µ and decreasing the divergence by the same factor gives a joint probability distribution identical to that of the original process. For ML estimation under time-reversible models, it is usual to set µ so that one substitution is expected in unit time (when the process is stationary); that is, µ
i
iqi= 1, where
i is the stationary frequency of character i and qi is the rate at which substitutions occur, given the character is currently i (qi = –qii, where qii is the (i, i) entry of the instantaneous rate matrix for the Markov process). The log-det estimator does not assume that the underlying processes is stationary (Steel 1994
) and uses a different scaling that does not explicitly involve the stationary distribution: µ
iqi/N = 1. For all calculations in this paper, the scaling factor for the rate matrix of GTR was chosen so that divergences estimated by log-det and GTR are directly comparable. Estimation of divergence by (parameteric) ML using the GTR model will be referred to as the ML-GTR method.
Unlike the case of large samples, there are no formulas available that describe the small-sample performance of these estimators and it does not necessarily hold that an estimator with good large-sample properties is also good for small samples. The actual performance for small samples can be investigated by simulation: many sets of data are randomly generated according to a known model of evolution and the divergence calculated for each one. The mean squared error (MSE) of these divergences is an estimate of the expected squared error of each estimator (the variance, if the estimator is unbiased). The small-sample bias of each estimator can also be investigated using this approach.
| Results |
|---|
|
|
|---|
We present the results in 3 subsections. We consider the performance of the LLF approximation to empirical likelihood and show how to construct CIs using this framework. We then analyze the large-sample performance, and finally the small-sample performance, of the log-det estimator.
Performance of the LFF Approximation
The LFF-approximated empirical likelihood curve for the divergence between human and orangutan mitochondrial sequences (Brown et al. 1982
) is shown in figure 1, the curve being defined by equations 5, 6, and 7. As it must, the maximum empirical log-likelihood occurs when the divergence is equal to the log-det estimate of the divergence between the 2 sequences.
|
Curves calculated using equation 7, as illustrated in figure 1, are only an approximation to the curves for the ELF (i.e., those curves defined by eq. 4), necessarily lying below the true curve. There may be joint probability matrices neighboring those in the LFF that have the same log-det divergence but greater empirical likelihood. To assess the accuracy of the LFF approximation, random joint probability matrices were simulated from a Dirichlet distribution with mean equal to a member of the LFF and with various different scaling factors (5,000; 10,000; 20,000; ...; 50,000) to control how near the sampling was to the mean. If the approximation is inaccurate, then many of these points will lie above the LFF curve; the farther above, the worse the approximation. The empirical log-likelihoods and log-det divergences of the data under these random families for the human–orangutan sequence comparison are shown as points in figure 1. Few points lie above the LFF curve, and then only by a very small amount; it is clear that the LFF is an extremely good approximation to the ELF for these data. In addition, a further 1,000,000 points were generated uniformly over the space of all joint probability matrices. The maximum empirical likelihood for these matrices was –1767, far short of the values appearing in figure 1 and so far from the ELF.
CIs for the log-det estimator can be constructed in the usual manner using the LFF curve. Because twice the difference between the maximum of the ELF and its value when it is evaluated at the underlying divergence t0 is approximately 
distributed (Owen 1988
), an appropriate interval can be determined by placing thresholds on the LLF curve. The values of
corresponding to the end points of a CI of size x, (
L,
U), are the solution of
where q(x) is quantile x of a 
distribution (e.g., x = 0.95, q(x) = 3.84 for a 95% CI; x = 0.99, q(x) = 6.63 for a 99% interval). The corresponding divergences LD(J(
L)) and LD(J(
U)) provide the lower and upper bounds of the CI. Such points can be found using standard root-finding techniques (e.g., Press et al. 1992
), the task made slightly easier because the roots must lie on either side of
= 0.
CIs of size 95%, 99%, and 99.9% for the human–orangutan divergence are shown in figure 1. Empirical likelihood-derived 95% and 99% CIs for all pairwise comparisons in the Brown et al. (1982)
mitochondrial data are reported in table 1; these agree closely with intervals derived analogously using ML-GTR (not shown). The CIs shown for the normal approximation are generally smaller than those derived using empirical likelihood. Because the coverage error for the empirical likelihood intervals, even after taking the LFF approximation into account, is an order smaller than that for normal intervals, this result suggests that the normal approximation may be systematically returning intervals with an actual size smaller than the nominal size. A possible explanation for this would be the failure to take the skew of the distribution into account.
|
As an alternative to empirical likelihood, confidence in log-det estimates of divergence can also be assessed using bootstrap techniques. CIs for the human–orangutan divergence produced using Efron's nonparametric and Rubin's Bayesian bootstraps (Efron 1979
|
Large-Sample Performance
Pairwise sequence divergence is often measured by assuming sequence evolution proceeded according to a more restricted model than GTR. For example, the Kimura 2-parameter model (Kimura 1980
|
Small-Sample Performance
When an estimator is unbiased, the variance and the MSE derived from simulations agree, so we estimate efficiency as the ratio of the minimum possible variance and the estimated MSE. For the results presented in this paper, all estimated MSE's were based on 100,000 simulated pairs of sequences. Estimating the MSE in this manner has a minor problem: there is some probability that the estimated divergence is not finite. Such unusual results are due to large deviations from the mean, so their probability of occurrence decreases exponentially as the number of sites in the sequences increases. Reflecting likely use in practice, simulated pairs of sequence with infinite divergence were discarded. The model used for simulation was GTR with parameters equal to the ML estimates for the human–orangutan sequences.
Figure 4 shows that there is little difference between the log-det and ML-GTR divergence estimators when the divergence is equal to that of the human–orangutan sequences. Both their efficiencies and biases are almost indistinguishable compared with the stochastic error of the simulation, the log-det estimator marginally outperforming ML-GTR. As the sample size increases, the efficiency tends to 1 and the bias tends to 0, suggesting that both estimators are asymptotically unbiased and efficient. Very few of the simulated sequences were discarded due to having infinite estimated divergence, with 1.5% of generated sequences being discarded when the length is 50 characters and none discarded for sequences longer than 200 characters.
|
The efficiency of estimators will also vary with the underlying divergence being estimated. Figure 5 illustrates how the efficiency of the log-det estimator varies as both the number of sites in the sequences and the underlying divergence changes. For low divergence and many sites, the efficiency of the log-det estimator rapidly approaches 1, although the number of sites needed for the estimator to become efficient increases quite dramatically as the underlying divergence approaches and exceeds 1. For high divergence and few sites, many generated sequences are discarded due to having infinite estimated divergence, and this results in artifactual high efficiencies that eventually exceed 1. All points where more than 5% of sequences were discarded have been excluded from the surface in figure 5.
|
| Discussion and Conclusions |
|---|
|
|
|---|
We have placed the log-det estimator of divergence inside a powerful statistical framework. This framework has many benefits, including more accurate assessment of performance and, due the close connection with CIs, a hypothesis testing framework with good asymptotic properties. In order to study the log-det estimator using empirical likelihood, we have made a LFF approximation that is very successful and has good statistical properties in its own right (Lee and Young 1999
Empirical likelihood CIs using the LFF approximation are computationally easier than bootstrap techniques, relying on a root-finding algorithm rather than simulating many replicates. Neither technique is particularly burdensome unless CIs of a particularly large size or approximate P values of extreme points are required; in these cases, empirical likelihood has computational benefits. Another advantage of empirical likelihood over bootstrap methods is that it can readily be extended to the problem of finding simultaneous CIs on the divergences between all pairs in an alignment of sequences, allowing confidence in trees to be assessed and permitting individual topologies to be rejected as incompatible with observed pairwise divergences.
We have shown that the LFF can be a good approximation to empirical likelihood and can be used to construct CIs whose shape is determined by the distribution of the data, rather than an assumption of normality. These CIs agree closely with those produced using bootstrap methods. The small-sample variance and bias of the log-det estimator are comparable with those of ML using the GTR model and, for low to moderate divergences, it is close to efficient. Its performance rapidly declines for large divergences, however, especially when the data could have been explained using a simpler model of evolution.
Instead of using the method outlined here, CIs can also be calculated using a normal distribution approximation about the estimated divergence with variance equal to the asymptotic variance of the estimator (estimated by evaluating at the empirical joint probability matrix). These CIs are symmetric and typically have coverage error (the difference between the actual and nominal probability that the true divergence lies in the interval) that decreases as n–1/2 as the number of observations (aligned sites, n) increases. In contrast, coverage error decreases as n–1 for intervals based on empirical likelihood (Hall and Scala 1990
). Because the shape of empirical likelihood CIs is driven by the data, and is not restricted to being symmetric, the intervals tend to be narrower and more accurate than those based on the normal approximation for an equivalent statistical size. Unlike those derived from approximation by the normal distribution, empirical likelihood CIs cannot include impossible negative divergences.
For simplicity, CIs of a known size have been derived from the empirical likelihood surface by assuming that the asymptotic
2 calibration is correct. Although the error using this method decreases rapidly (doubling the number of observations halves the expected error), there are other techniques that may prove more effective: a Bartlett correction (DiCiccio et al. 1991
) of the statistic increases the speed at which the error decreases, so doubling the number of observations quarters the expected error, but Owen (2001)
recommends calibrating the likelihood surface using the nonparametric bootstrap. Although this procedure may seem redundant—if we have to generate bootstraps, why not use them directly?—it tends to produce useful estimates with few replicates because the contribution of each replicate is not restricted to the left-hand or right-hand tail of the distribution. As noted above, the empirical likelihood approach can also be generalized to several sequences, and bootstrap calibration can also be used in this context even though bootstrap confidence regions cannot be easily defined.
Our results suggest that both the log-det and ML-GTR estimators are asymptotically efficient. This is not surprising for the ML-GTR divergence estimator because ML estimators are known to be asymptotically efficient and unbiased. The apparent asymptotic efficiency of the log-det estimator is surprising as it produces unbiased divergence estimates under more general conditions than ML-GTR, but we provide a proof in Appendix B that this result holds for almost all members of the GTR family of models. As shown in the comparison of asymptotic efficiency for divergence estimation when the model is fixed (fig. 3), being efficient for all time-reversible models does not mean that the estimator is necessarily good when the assumed model is too general and information is wasted fitting unnecessary parameters. This consideration will be especially important for calculation of divergence between amino acid sequences, where it is usual to assume a fixed empirical model like JTT (Jones et al. 1992
) or WAG (Whelan and Goldman 2001
) and the appropriate GTR model has 189 additional parameters, resulting in a higher variance for estimates. The use of a compressed alphabet (Wang J and Wang W 1999
; Coghlan et al. 2001
; Kosiol et al. 2004
) might help both by reducing the divergence to be estimated and the number of additional parameters in the corresponding GTR model.
Our results give guidance as to situations in which the log-det divergence estimator will be efficient or inefficient. The methods introduced mean that the efficiency of the estimator can be calculated in particular cases, and the accuracy of individual estimates can be assessed. This second point is extremely important if the estimates are to be compared or used in phylogenetic reconstructions.
| Appendix A Influence Function of the Log-Det Estimator |
|---|
|
|
|---|
The quantity uij defined in equation 6 is the influence function (Hampel et al. 1986
) for the distribution formed by contaminating a distribution X with a fraction
of i
j substitutions, so Aij(
) = (1–
)X +
Eij, where Eij is the N x N matrix whose (i, j) element is 1 and all other elements are 0. Formally, the influence function of the log-det estimator is defined as the limit
![]() |
By the linearity of the log-det formula (eq. 1), this derivative can be broken into 2 parts: the derivative of the summation of the logarithms of the column and row sums and the derivative of the logarithm of the determinant of Aij(
). The derivatives of the logarithms of the column and row sums are:
|
| (9) |
|
| (10) |
The corresponding derivative for the logarithm of the determinant of Aij(
) follows from Jacobi's formula for the derivative of a determinant:
|
|
|
|
The derivative of Aij(
) with respect to
is –X + Eij, giving the following results:
|
| (11) |
|
| (12) |
= 0 gives the influence function for the log-det estimator: |
|
| Appendix B Asymptotic Efficiency of Log-Det Estimation |
|---|
|
|
|---|
It has long been appreciated that there is a connection between the log-det estimator and the GTR model of evolution (Rodríguez et al. 1990
|
| (13) |
is the diagonal matrix whose elements form the stationary distribution of Q and
For a given set of data, there is an absolute upper bound for the log-likelihood of any distribution Jij, attained when Jij =
. If there exist
and Q that satisfy equation 13, then the upper bound is achieved and
and Q are ML estimates. The existence of
and Q can easily be checked using standard properties of time-reversible Markov chains and the matrix logarithm function. This is not sufficient to establish that the 2 estimators have the same asymptotic distribution because there is some probability that a valid solution to equation 13 does not exist and the estimates will differ. We provide a proof that their sampling distributions of the ML-GTR and log-det estimators do indeed converge, establishing that the log-det estimator is asymptotically efficient for the GTR model.
Previously, we stated that the asymptotic variance of the log-det estimator can be found by evaluating its influence function at the true joint probability function. This statement ignores the need to check that several regularity conditions hold (see Hampel et al. 1986
, and references therein). Asymptotic normality of the log-det estimator is more easily verified by means of the delta method (e.g., Cox 2005
). Because the central limit theorem guarantees that
(
–J0) converges in distribution to a multivariate normal of known variance, the delta method establishes that
|
|
Our proof proceeds by considering the symmetric form of the log-det estimator, where
is replaced by
. This has the same asymptotic distribution as the log-det estimator, again established using the delta method, but it is much more likely to be a solution to equation 13 because both
and
eQ are guaranteed to be symmetric. We show that, for large samples, a solution to equation 13 almost always exists, and so the asymptotic distribution of the symmetric form of the log-det is identical to that of the ML-GTR estimator; hence, so too is the asymptotic distribution for the log-det estimator. Asymptotic efficiency of the log-det estimator then follows from the asymptotic efficiency of ML-GTR.
Consider an appropriately scaled instantaneous rate matrix of a time-reversible continuous-time Markov chain, Q0, such that every off-diagonal element is strictly greater than zero (a generator for a uniform semigroup), and let the true divergence be t0 > 0 (the case for t0 = 0 being trivial). By definition, the matrix Q0 t0 belongs to an open region in the space of generators for time-reversible continuous-time Markov chains that contain a ball of nonzero radius with center Q0 t0. The inverse function theorem (e.g., Rudin 1976
) guarantees that there is an open region in the space of symmetric joint probability matrices, containing J0 =
eQ0t0, for which a solution to equation 13 always exists. If the symmetric log-det and ML-GTR estimators disagree, the empirical joint probability matrix must lie outside of this region. The strong law of large numbers (e.g., Billingsley 1995
) establishes convergence into this region, and hence, convergence of the sampling distribution of the symmetric log-det estimator to that of ML-GTR, with probability 1.
The above proof shows that the log-det and symmetric log-det estimators are asymptotically efficient, providing that the arbitrary scaling factor for the underlying Markov process is chosen to be consistent with the log-det estimator, rather than the more natural scaling normally used (see text for details). We now show that, providing the correct scaling factor is estimated efficiently, the rescaled form of the log-det estimator is still asymptotically efficient.
For 2 functionals, f and g, of random variable X, the asymptotic variance of their product can be obtained by the delta method:
|
| (14) |
| Acknowledgements |
|---|
|
|
|---|
This work was funded by the European Molecular Biology Laboratory.
| Footnotes |
|---|
Peter Lockhart, Associate Editor
| References |
|---|
|
|
|---|
Baake E, von Haeseler A. Distance measures in terms of substitution processes. Theor Popul Biol. (1999) 55:166–175.[CrossRef][Web of Science][Medline]
Barry D, Hartigan JA. Asynchronous distances between homolgous DNA sequences. Biometrics (1987) 43:261–276.[CrossRef][Web of Science][Medline]
Billingsley P. Probability and measure (1995) 3rd ed. New York: Wiley.
Brown WM, Prager EM, Wang A, Wilson AC. Mitochondrial DNA sequences of primates: tempo and mode of evolution. J Mol Evol. (1982) 18:225–239.[CrossRef][Web of Science][Medline]
Bruno WJ, Socci ND, Halpern AL. Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol Biol Evol. (2000) 17:189–197.
Coghlan A, Dónaill DA, Buttimore NH. Representation of amino acids as five-bit or three-bit patterns for filtering protein databases. Bioinformatics (2001) 17:676–685.
Cox C. Delta method. In: Encyclopedia of Biostatistics—Armitage P, Colton T, eds. (2005) 2nd ed. New York: Wiley. 1409–1411.
DiCiccio JT, Romano JP. Nonparametric confidence limits by resampling methods and least favourable families. Int Stat Rev. (1990) 58:59–76.
DiCiccio TJ, Hall P, Romano J. Empirical likelihood is Bartlett-correctable. Ann Stat. (1991) 19:1053–1061.
Efron B. Bootstrap methods: another look at the jackknife. Ann Stat. (1979) 7:1–26.
Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. (1981) 17:368–376.[CrossRef][Web of Science][Medline]
Felsenstein J. Inferring phylogenies (2004) Sunderland (MA): Sinauer associates.
Fitzpatrick DA, Creevey CJ, McInerney JO. Genome phylogenies indicate a meaningful
-proteobacterial phylogeny and support a grouping of the mitochondria with the Rickettsiales. Mol Biol Evol. (2006) 23:74–85.
Gascuel O. BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol. (1997) 14:685–695.[Abstract]
Goldman N. Phylogenetic information and experimental design in molecular systematics. Proc R Soc Lond B Biol Sci. (1998) 265:1779–1786.[Medline]
Goremykin VV, Hellwig FH. Evidence for the most basal split in land plants dividing bryophyte and tracheophyte lineages. Plant Syst Evol. (2005) 254:93–103.[CrossRef]
Gu X, Li W-H. Bias-corrected paralinear and LogDet distances and tests of molecular clocks and phylogenies under nonstationary nucleotide frequencies. Mol Biol Evol. (1996) 13:1375–1383.[Abstract]
Hall P, Scala BL. Methodolgy and algorithms of empirical likelihood. Int Stat Rev. (1990) 58:109–127.
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA. Robust statistics: the approach based on influence functions (1986) New York: Wiley.
Hasegawa M, Kishino H, Yano T. Man's place in Hominoidea as inferred from molecular clocks of DNA. J Mol Evol. (1985) 21:160–174.
Hoffmeister M, van der Klei A, Rotte C, van Grinsven KWA, van Hellemond JJ, Henze K, Tielens AGM, Martin W. Euglena gracilis rhodoquinone:ubiquinone ratio and mitochondrial proteome differ under aerobic and anaerobic conditions. J Biol Chem. (2004) 279:22422–22429.
Jones DT, Taylor WR, Thornton JM. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. (1992) 8:275–282.
Jukes TH, Cantor CR. Evolution of protein molecules. In: Mammalian protein metabolism—Munro HN, ed. (1969) New York: Academic Press. 21–132.
Kimura M. A simple method for estimating evolutionary rate of base substitution through comparative studies of nucleotide sequences. J Mol Evol. (1980) 16:111–120.[CrossRef][Web of Science][Medline]
Kosiol C, Goldman N, Buttimore NH. A new criterion and method for amino acid classification. J Theor Biol. (2004) 228:97–106.[CrossRef][Web of Science][Medline]
Lake JA. Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci USA (1994) 91:1455–1459.
Lazar NA, Mykland PA. An evaluation of the power and conditional properties of empirical likelihood. Biometrika (1998) 85:523–534.
Lee SMS, Young GA. Nonparametric likelihood ratio confidence intervals. Biometrika (1999) 86:107–118.
Lockhart PJ, Cameron SA. Trees for bees. Trends Ecol Evol. (2001) 16:84–88.[CrossRef][Medline]
Lockhart PJ, Steel MA, Hendy MD, Penny D. Recovering evolutionary trees under a more realistic model of sequence evolution. Mol Biol Evol. (1994) 11:605–612.[Web of Science]
Martin W, Deusch O, Stawski N, Grünheit N, Goremykin V. Chloroplast genome phylogenetics: why we need independent approaches to plant molecular evolution. Trends Plant Sci. (2005) 10:203–209.[CrossRef][Web of Science][Medline]
Negrisolo E, Minelli A, Valle G. The mitochondrial genome of the house centipede Scutigera and the monophyly versus paraphyly of myriapods. Mol Biol Evol. (2004) 21:770–780.
Nikolaev S, Montoya-Burgos JI, Margulies EH, Program NCS, Rougemont J, Nyffeler B, Antonarakis SE. Early history of mammals is elucidated with the ENCODE multiple species sequencing data. PLoS Genetics (2007) 3:e2.[CrossRef]
Owen AB. Empirical likelihood ratio confidence intervals for a single functional. Biometrika (1988) 75:237–249.
Owen AB. Empirical likelihood (2001) London: Chapman and Hall.
Pawitan Y. In all likelihood: statistical modelling and inference using likelihood (2001) Oxford: Oxford University Press.
Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical recipes in C (1992) 2nd ed. Cambridge: Cambridge University Press.
Rodríguez F, Oliver JL, Martin A, Medina JR. The general stochastic model of nucleotide substitution. J Theor Biol. (1990) 142:485–501.[Web of Science][Medline]
Rubin DB. The Bayesian boostrap. Ann Stat. (1981) 9:130–134.
Rudin W. Principles of mathematical analysis (1976) 3rd ed. New York: McGraw-Hill.
Schadt E, Lange K. Codon and rate variation models in molecular phylogeny. Mol Biol Evol. (2002) 19:1534–1549.
Steel MA. Recovering a tree from the leaf colourations it generates under a Markov model. Appl Math Lett. (1994) 7:19–23.
Tavaré S. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect Math Life Sci. (1986) 17:57–86.
Thollesson M. LDDIST: a Perl module for calculating LogDet pair-wise distances for protein and nucleotide sequences. Bioinformatics (2004) 20:416–418.
Turmel M, Otis C, Lemieux C. The chloroplast genome sequence of Chara vulgaris sheds new light into the closest green algal relatives of land plants. Mol Biol Evol. (2006) 23:1324–1338.
Wang J, Wang W. A computational approach to simplifying the protein folding alphabet. Nat Struct Biol. (1999) 6:1033–1038.[CrossRef][Web of Science][Medline]
Whelan S, Goldman N. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol. (2001) 18:691–699.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








