Skip Navigation


MBE Advance Access originally published online on May 4, 2008
Molecular Biology and Evolution 2008 25(8):1576-1580; doi:10.1093/molbev/msn103
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
25/8/1576    most recent
msn103v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hall, B. G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hall, B. G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

How Well Does the HoT Score Reflect Sequence Alignment Accuracy?

Barry G. Hall

Bellingham Research Institute

E-mail: barryhall{at}zeninternet.com.


    Abstract
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Multiple sequence alignment is an essential tool in many areas of biological research, and the accuracy of an alignment can strongly affect the accuracy of a downstream application such as phylogenetic analysis, identification of functional motifs, or polymerase chain reaction primer design. The heads or tails (HoT) method (Landan G, Graur D. 2007. Heads or tails: a simple reliability check for multiple sequence alignments. Mol Biol Evol. 24:1380–1383.) assesses the consistency of an alignment by comparing the alignment of a set of sequences with the alignment of the same set of sequences written in reverse order. This study shows that HoT scores and the alignment accuracies are positively correlated, so alignments with higher HoT scores are preferable. However, HoT scores are overestimates of alignment accuracy in general, with the extent of overestimation depending on the method used for multiple sequence alignment.

Key Words: sequence alignment accuracy • HoT consistency • alignment methods


    Introduction
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Multiple sequence alignment is an essential tool in many areas of biological research including tertiary protein structure prediction, identification of functional motifs and domains, polymerase chain reaction primer design, reconstruction of ancient ancestral protein sequences, and phylogenetic inference (Mullan 2002Go; Ogden and Rosenberg 2006Go). Multiple sequence alignment is a way to write a set of homologous sequences of unequal length so that homologous nucleotides or amino acids are written one above the other in a column, that is accomplished by inferring the presence of gaps, corresponding to historical insertions and deletions. Multiple sequence alignments are estimates of the positions of those gaps, not truth, and like all estimates may be more or less accurate. The accuracy of multiple sequence alignments is of considerable concern to downstream analyses, such as phylogenetic analyses, because the accuracies of the downstream analyses depend strongly on the accuracy of the alignment (Hall 2005Go; Rosenberg 2005Go; Kumar and Filipski 2006Go; Ogden and Rosenberg 2006Go). A variety of studies have compared the accuracies of different multiple sequences alignment programs based either on "gold standard" structural alignments such as the BAliBASE collection (Thompson et al. 1999aGo, 2005Go; Lassmann and Sonnhammer 2002Go, 2005Go; Do et al. 2005Go; Katoh et al. 2005Go; Carroll et al. 2007Go) or on simulated data sets (Katoh et al. 2002Go; Nuin et al. 2006Go). Those studies, however, do not allow the individual investigator to judge whether a particular alignment is sufficiently accurate to justify trusting the results of a downstream analysis.

Some studies offer a degree of guidance with respect to multiple alignment accuracy. For instance, when aligning protein sequence, when the average amino acid identity is >25% the alignment accuracy is on average >80%, and when amino acid identity is <20% alignment accuracy is <50% (Thompson et al. 1999aGo). For DNA alignments, when aligned sequence identity is >60% accuracy is >80%, but accuracy drops below about 50% when identity is below 50% (Ogden and Rosenberg 2006Go). Those guidelines, while better than nothing, are not terribly helpful because they are based on sequence identities that are averages of closely related sequences with distantly related sequences. They also fail to take into account gap density. In general, the "gappier" the alignment the lower is the accuracy (Nuin et al. 2006Go). Most alignment programs allow users to set gap penalties and other parameters, and in some situations, accuracy can be improved by modifying the default settings of those parameters. However, I would speculate that >99% of alignments are performed using the default parameter settings simply because there is no easy way to assess whether modifying the parameters makes the alignment more or less accurate.

To help remedy the situation, Landan and Graur (2007)Go proposed the heads or tails (HoT) method to check the reliability of multiple alignments. Landan and Graur use the term "reliability," but the HoT method actually measures how consistent a method is in the face of alternate data sets and not how accurate the method is. In the classic dartboard analogy, the HoT approach is a way of measuring whether a method managed to place all the darts close together or spreads them around the board. It does not measure how close they are to the bull's eye. Accordingly, I shall simply refer to the HoT score as a measure of HoT consistency.

According to the HoT-score method, the unaligned sequences are first reversed to produce a "tails" data set, then alignments are generated from the original (heads) and the tails data sets. The heads and tails alignments are compared by 2 criteria, columns and residue pairs. The columns comparison shows the fraction of columns that are identical between the 2 alignments and is the more conservative measure. The residue pairs measure shows the proportion of residue pairs that are identical in the 2 alignments. Although Landan and Graur provide measures of consistency, we have no idea how those measures correspond to measures of accuracy.

Accuracy is most frequently measured by comparing an estimated alignment with a protein structure–based alignment such as a BAliBASE alignment. Although they are often treated as gold standards, structure-based alignments are not "true" alignments, they are simply better estimates of the true historical relationships among homologous sites. Measuring accuracy requires knowing the true alignment, which is only possible with simulated data. EvolveAGene 3 (Hall 2008Go), a coding sequence simulation program that incorporates indels, provides true alignments that can be used to measure the accuracy of estimated DNA and protein sequence alignments. That accuracy can then be compared with the HoT score of the alignment to assess how well the HoT score reflects accuracy.

In this study I used simulated data sets to obtain known, true, alignments of both protein and DNA sequences. The unaligned sequences were then aligned to produce a heads alignment. The heads alignment was compared with the true alignment to measure the "accuracy" of the alignment. The sequences were then reversed and aligned to produce a tails alignment. The heads and tails alignments were compared by the residue pairs score to measure the HoT score. I then examined the relationships between accuracy and HoT scores over a realistic range of alignment conditions. I also examined whether the HoT-score method can be used to accurately assess the effects of changes in alignment program parameter settings or to decide which alignment program performs better on a particular data set.


    Methods
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Simulations
Sequence evolution was simulated by EvolveAGene 3 (Hall 2008Go) based on 20 data sets from the BAliBASE collection of structure-based alignments (Thompson et al. 1999bGo, 2005Go; Bahr et al. 2001Go), with 5 independent simulations based on each of the BAliBASE data sets. Each simulation was based on the topology and mean branch length of a maximum likelihood tree of the BAliBASE alignment and was initiated from the first sequence in the BAliBASE data set. The simulated data sets are runs 1–5 in the simulation sets in table 3 of Hall (2008)Go. The number of sequences in the simulated data sets ranged from 10 to 142, the mean number of indels per base substitution ranged from 0.0026 to 0.0046, the tree lengths ranged from 8.9 to 72.9 substitutions per site, and the pairwise percent amino acid identities ranged from 11.2% to 83.8%. EvolveAGene 3 output includes the true DNA and protein sequence alignments and the unaligned DNA and protein sequences. The unaligned sequences were termed the "heads" sequences.

Alignments
Protein sequences were aligned by ClustalW 1.83 (Thompson et al. 1994Go) and by ProbCons (Do et al. 2005Go), whereas DNA sequences were aligned by ClustalW 1.83 and by ProbConsRNA beta http://probcons.stanford.edu/download.html. In the text, ProbConsRNA beta is simply referred to as ProbCons. ProbCons and ProbConsRNA beta alignments used the default settings, as did ClustalW DNA alignments. Three different sets of parameter settings were used to align protein sequences by ClustalW: default settings (pairwise alignment gap opening = 10.0, pairwise alignment gap extension = 0.1, multiple alignment gap opening = 10.0, multiple alignment gap extension = 0.2, and minimum gap separation distance = 4); BioX http://www.macupdate.com/info.php/id/20630 implementation of ClustalW default settings (same as ClustalW settings except that minimum gap separation distance = 8); and improved settings (pairwise alignment gap opening = 10.0, pairwise alignment gap extension = 0.1, multiple alignment gap opening = 3.0, multiple alignment gap extension = 1.8, and minimum gap separation distance = 1). The improved settings are similar to those that I have previously recommended (Hall 2007Go).

HoT Score
The Perl script HoT_reverse (Landan and Graur 2007Go) was used to reverse the heads sequences to create the tails sequences. The Perl script HoT_compare (Landan and Graur 2007Go) was used to compare the heads alignment with the tails alignment, and the HoT score is given as the total residue pairs score reported by HoT_compare. The residue pairs score is the proportion of residue pairs that are paired identically in the heads and tails alignments and is the equivalent of the sum-of-pairs score of Thompson et al. (1999aGo; Landan and Graur 2007Go).

Alignment Accuracy
Alignment accuracy was measured by comparison of the true alignment with the heads alignment and is the proportion of residue pairs that are paired identically in the 2 alignments, equivalent to the residue pairs score of HoT_compare. Accuracy was determined by the Perl script CompareAlign, which is available upon request to barryhall{at}zeninternet.com. HoT scores and alignment accuracies of the 100 data sets are in supplementary table S1 (Supplementary Material online).


    Results
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Using HoT Scores to Set Alignment Parameters
Figure 1A is a probability plot of the HoT scores. The further to the right is the curve, the more consistent is the method. HoT-score consistencies are ClustalW BioX < ClustalW default < ClustalW improved < ProbCons. If the HoT-score method is useful for choosing alignment parameters or alignment methods, then the order of method with respect to accuracy should be the same as that with respect to HoT-score consistency. Figure 1B shows that that is the case; however, HoT scores are generally higher than is accuracy.


Figure 1
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Probability plots of (A) HoT scores and (B) accuracy of protein alignments. The y axis shows the percent of samples that had a HoT score or accuracy lower than that indicated by the x axis value for each of the alignment programs and settings.

 
Another approach to judging the value of HoT scores to choosing alignment parameters or methods is to consider, for each data set, the HoT score or accuracy of one method relative to another method; for example, the accuracy of ClustalW preferred settings relative to ClustalW default settings. If the preferred settings relative to the default settings are close to 1.0, it probably does not matter which settings are used. I have arbitrarily chosen 0.95 and 1.05 as the range that suggests insignificant differences in HoT scores or accuracy. Table 1 shows that 100% of ClustalW BioX alignments have significantly lower HoT scores than ClustalW default alignments, and none have higher HoT scores, whereas 96% are less accurate and none more accurate. In general, determination of HoT scores would result in choosing the more accurate method.


View this table:
[in this window]
[in a new window]

 
Table 1 Relative HoT Reliabilities and Accuracies of Alignment Methods

 
The HoT-score approach would err if it led to choosing the significantly more HoT-consistent method when that method was, in fact, significantly less accurate. Table 1 shows that in deciding between the ClustalW BioX and ClustalW default settings, the HoT-score approach would lead to the wrong choice only 12% of the time; in choosing between the ClustalW default and ClustalW improved method, it would err only 3% of the time; and in choosing between ProbCons and ClustalW improved, it would err only 3% of the time. In deciding between ProbCons and ClustalW to align DNA sequences, HoT scores would err only 2% of the time.

Using HoT Scores to Assess Alignment Accuracy
Comparison of figure 1A with figure 1B suggests that overall HoT scores are a bit higher than accuracy. To determine how well HoT scores correspond to accuracy, HoT scores were plotted versus accuracy for ClustalW improved and ProbCons protein alignments and for ClustalW and ProbCons DNA alignments (fig. 2). In figure 2, points that are below and to the right of the diagonal line have higher HoT scores than accuracy, whereas those above and to the left of the diagonal have lower HoT scores than accuracy.


Figure 2
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— HoT scores versus accuracy. (A) and (B), protein alignments. (C) and (D), DNA alignments. Dashed lines are least squares linear regression lines.

 
If HoT scores precisely reflected accuracy the points would all fall along the diagonal lines in figure 2. Instead, the majority of the points fall to the right of and below the diagonal, that is, in general HoT scores overestimate accuracy. For the protein alignments (table 2), the ProbCons method is 0.06 more accurate than the ClustalW improved method, and ProbCons HoT scores overestimate accuracy by only 0.03, compared with ClustalW HoT scores over estimating accuracy by 0.06. Clearly, for protein alignments, ProbCons is the method of choice. For the DNA alignments, ProbCons is also the more accurate method, but ProbCons HoT scores overestimate accuracy by a whopping 0.12, whereas the less accurate ClustalW HoT scores only overestimate accuracy by 0.03.


View this table:
[in this window]
[in a new window]

 
Table 2 Mean Accuracy and HoT Scores

 
Landan and Graur used total tree length as a measure of sequence divergence and noted that HoT scores decrease as tree length increases (Landan and Graur 2007Go). Figure 3 shows that accuracy also decreases as tree length increases, but the correlations are not strong. Indeed, for the ProbCons method of protein sequence alignment, accuracies of trees whose lengths ranged from 60 to 75 substitutions per site were higher than accuracies of trees whose lengths ranged from 40 to 50 substitutions per site.


Figure 3
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Alignment accuracy versus tree length.

 
When alignment accuracy is >60%, phylogenetic trees are, on average, about as accurate as when estimated from the true alignments (Ogden and Rosenberg 2006Go). Although I am unaware of similar studies of other downstream applications of alignments, it is reasonable that we should be most concerned with those difficult to align data sets whose accuracies are <60%. By the <60% accurate criterion 18 of the ClustalW improved, 11 of the ProbCons protein alignments, 23 of the ClustalW DNA, and 16 of the ProbCons DNA alignments were difficult to align. HoT scores failed to identify 10 of the 18 ClustalW difficult protein alignments as "difficult," correctly identified all the ProbCons difficult protein alignments, but misidentified 2 ProbCons acceptable alignments as being difficult. HoT scores failed to identify only 1 of the 21 difficult ClustalW DNA alignments but failed to identify 9 of the 16 difficult ProbCons DNA alignment.

To use the dartboard analogy, HoT scores do not show that as the darts cluster more tightly they tend to cluster around the bull's eye.


    Discussion
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Because multiple sequence alignment is such a basic tool in comparisons of molecular sequence, it is very important to be able to easily evaluate the accuracy of alignments. As a tool for judging the accuracy of alignments, the value of the HoT-score method varies according to the alignment method that is used (table 2). For the 2 protein alignment methods that are considered here, HoT scores are pretty good for the ProbCons method and fair for the ClustalW improved method. For the DNA alignment methods, there is conflict between accuracy of the method and how well HoT scores reflect that accuracy. ProbCons is more accurate, but HoT scores greatly overestimate that accuracy. The method dependence of the degree to which HoT scores reflect accuracy suggests that HoT scores should not be used as an estimator of absolute accuracy. However, the value of Hot scores as an indicator of "relative" accuracy is an entirely different matter.

There are a variety of alignment methods and programs available, and for many of those programs, there are a variety of parameters that can be set. Although there are numerous studies comparing the accuracies of those programs on test data sets, it is often not obvious which method is best for any particular real data set and it is extraordinarily difficult to judge how adjusting alignment parameters affects alignment accuracy. Those judgments do not require estimating absolute accuracy, but they do require reliable estimates of relative accuracy. HoT scores are an effective estimator of relative accuracy for both protein and DNA alignments (table 1) and would rarely lead to choosing the wrong method or parameter settings. It is important to stress that it is not the purpose of this study to suggest that particular parameter settings or methods are universally superior. Instead, it is to encourage readers to apply the HoT-score method to choose among methods and parameter settings for their own, unique data sets.

The HoT consistency measure is both clever and easy to apply and it serves a valuable purpose as a measure of relative accuracy for choosing among alignment methods or parameter settings, but it can be misleading as an estimator of absolute alignment accuracy. Although it is very much needed, we still lack a good way to estimate accuracies of real multiple sequence alignments.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary table S1 is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
I thank Reviewer #1 for his insight and considerable effort in raising issues with respect to the validity of using the HoT method to choose among parameter setting for alignment programs. I thank Reviewer #2 for the dartboard analogy.


    Footnotes
 
Sudhir Kumar, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Bahr A, Thompson JD, Thierry JC, Poch O. BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations. Nucleic Acids Res. (2001) 29:323–326.[Abstract/Free Full Text]

    Carroll H, Beckstead W, O'Connor T, Ebbert M, Clement M, Snell Q, McClellan D. DNA reference alignment benchmarks based on tertiary structure of encoded proteins. Bioinformatics (2007) 23:2648–2649.[Abstract/Free Full Text]

    Do CB, Mahabhashyam MS, Brudno M, Batzoglou S. ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res. (2005) 15:330–340.[Abstract/Free Full Text]

    Hall BG. Comparison of the accuracies of several phylogenetic methods using protein and DNA sequences. Mol Biol Evol. (2005) 22:792–802.[Abstract/Free Full Text]

    Hall BG. Phylogenetic trees made easy: a how-to manual (2007) 3rd ed. Sunderland (MA): Sinauer Associates.

    Hall BG. Simulating DNA coding sequence evolution with EvolveAGene 3. Mol Biol Evol. (2008) 25:688–695.[Abstract/Free Full Text]

    Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. (2005) 33:511–518.[Abstract/Free Full Text]

    Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. (2002) 30:3059–3066.[Abstract/Free Full Text]

    Kumar S, Filipski A. Multiple sequence alignment: in pursuit of homologous DNA positions. Genome Res. (2006) 17:127–135.[CrossRef][Web of Science]

    Landan G, Graur D. Heads or tails: a simple reliability check for multiple sequence alignments. Mol Biol Evol. (2007) 24:1380–1383.[Abstract/Free Full Text]

    Lassmann T, Sonnhammer EL. Quality assessment of multiple alignment programs. FEBS Lett. (2002) 529:126–130.[CrossRef][Web of Science][Medline]

    Lassmann T, Sonnhammer EL. Automatic assessment of alignment quality. Nucleic Acids Res. (2005) 33:7120–7128.[Abstract/Free Full Text]

    Mullan LJ. Multiple sequence alignment–the gateway to further analysis. Brief Bioinform. (2002) 3:303–305.[Free Full Text]

    Nuin PA, Wang Z, Tillier ER. The accuracy of several multiple sequence alignment programs for proteins. BMC Bioinformatics (2006) 7:471.[CrossRef][Medline]

    Ogden TH, Rosenberg MS. Multiple sequence alignment accuracy and phylogenetic inference. Syst Biol. (2006) 55:314–328.[CrossRef][Medline]

    Rosenberg MS. Multiple sequence alignment accuracy and evolutionary distance estimation. BMC Bioinformatics (2005) 6:278.[CrossRef][Medline]

    Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. (1994) 22:4673–4680.[Abstract/Free Full Text]

    Thompson JD, Koehl P, Ripp R, Poch O. BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins (2005) 61:127–136.[CrossRef][Web of Science][Medline]

    Thompson JD, Plewniak F, Poch O. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. (1999a) 27:2682–2690.[Abstract/Free Full Text]

    Thompson JD, Plewniak F, Poch O. BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics (1999b) 15:87–88.[Abstract/Free Full Text]

Accepted for publication April 27, 2008.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
C. Blouin, S. Perry, A. Lavell, E. Susko, and A. J. Roger
Reproducing the manual annotation of multiple sequence alignments using a SVM classifier
Bioinformatics, December 1, 2009; 25(23): 3093 - 3098.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. J. Weadick and B. S.W. Chang
Molecular Evolution of the {beta}{gamma} Lens Crystallin Superfamily: Evidence for a Retained Ancestral Function in {gamma}N Crystallins?
Mol. Biol. Evol., May 1, 2009; 26(5): 1127 - 1142.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
25/8/1576    most recent
msn103v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hall, B. G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hall, B. G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?