Molecular Biology and Evolution 19:501-509 (2002)
© 2002 Society for Molecular Biology and Evolution
Origin and Evolution of Influenza Virus Hemagglutinin Genes
Institute of Molecular Evolutionary Genetics and Department of Biology, The Pennsylvania State University
| Abstract |
|---|
|
|
|---|
Influenza A, B, and C viruses are the etiological agents of influenza. Hemagglutinin (HA) is the major envelope glycoprotein of influenza A and B viruses, and hemagglutinin-esterase (HE) in influenza C viruses is a protein homologous to HA. Because influenza A virus pandemics in humans appear to occur when new subtypes of HA genes are introduced from aquatic birds that are known to be the natural reservoir of the viruses, an understanding of the origin and evolution of HA genes is of particular importance. We therefore conducted a phylogenetic analysis of HA and HE genes and showed that the influenza A and B virus HA genes diverged much earlier than the divergence between different subtypes of influenza A virus HA genes. The rate of amino acid substitution for A virus HAs from duck, a natural reservoir, was estimated to be 3.19 x 10-4 per site per year, which was slower than that for human and swine A virus HAs but similar to that for influenza B and C virus HAs (HEs). Using this substitution rate from the duck, we estimated that the divergences between different subtypes of A virus HA genes occurred from several thousand to several hundred years ago. In particular, the earliest divergence time was estimated to be about 2,000 years ago. Also, the A virus HA gene diverged from the B virus HA gene about 4,000 years ago and from the C virus HE gene about 8,000 years ago. These time estimates are much earlier than the previous ones.
| Introduction |
|---|
|
|
|---|
Influenza viruses are members of the viral family Orthomyxoviridae and have a segmented, single-stranded, and negative-sense RNA genome in an enveloped virion (Smith, Andrewes, and Laidlaw 1933
Hemagglutinin (HA) is the major envelope glycoprotein of A and B viruses, and hemagglutinin-esterase (HE) in C viruses is a protein homologous to HA. HA (HE) is cleaved into the signal peptide (about 20 amino acids in influenza A viruses), protein HA1 (HE1) (about 320 amino acids), and protein HA2 (HE2) (about 220 amino acids) when mature proteins are produced (fig. 1
). HA1 (HE1) is a receptor-binding protein and the major target of immune responses, whereas HA2 (HE2) is an anchor protein of the envelope and mediates fusion of the envelope and the cellular endosomal membrane. Influenza A virus HA genes are classified into 15 subtypes (H1H15), according to their antigenic properties (WHO Memorandum 1980
), whereas B and C virus HA (HE) genes are not classified into subtypes. Because influenza A virus pandemics in humans appear to occur when new subtypes of HA genes are introduced from aquatic birds, an understanding of the origin and evolution of HA genes is of particular importance.
|
From the phylogenetic analyses of A and B virus HA genes, Webster et al. (1992)
The purpose of this paper is to study the evolutionary relationships of influenza A, B, and C virus HA (HE) genes. We are also interested in estimating the divergence times between these genes.
| Materials and Methods |
|---|
|
|
|---|
Phylogenetic Analyses
For constructing a phylogenetic tree for influenza A, B, and C virus HA (HE) genes, we used amino acid sequences because they are known to give more reliable results than nucleotide sequences when the sequence divergence is high (Nei and Kumar 2000, pp. 1732
Amino acid sequences of influenza A, B, and C virus HA2s (HE2s) were collected from the international DNA databank (DDBJ release 43). After excluding sequences from laboratory-adapted viruses and identical sequences within species, we obtained 57, 34, 58, 10, 29, 2, 41, 1, 4, 2, 1, 1, 3, 1, and 2 amino acid sequences for the H1H15 subtypes of A virus HA2s, respectively. We also obtained 15 sequences for B virus HA2s and 35 sequences for C virus HE2s. A total of 296 amino acid sequences were aligned by the computer program CLUSTAL W (Thompson, Higgins, and Gibson 1994
). After removing all alignment gaps, 207 amino acid sites were used for estimating p, Poisson correction (PC), and gamma distances (Nei and Kumar 2000
). The gamma shape parameter (a) was estimated to be 1.83 by Gu and Zhang's (1997)
method. The phylogenetic tree was constructed by the neighbor-joining (NJ) method (Saitou and Nei 1987
), and the reliability of each interior branch was tested by the bootstrap method with 1,000 resamplings (Felsenstein 1985
; Kumar et al. 2001
). The NJ trees were also constructed for 17 amino acid sequences which were randomly chosen from each subtype of A virus HA2s and from B virus HA2s and C virus HE2s (table 1
).
|
Estimation of Divergence Times
For estimating the divergence times between subtypes of influenza A virus genes, we used only A virus sequences because B and C virus sequences were not necessary. We also used amino acid sequences for the entire region of HA because the alignment for A virus HAs appeared to be reliable (fig. 1 ) (Rohm et al. 1996
We obtained 50, 25, 24, 10, 21, 2, 25, 1, 4, 2, 1, 1, 3, 1, and 2 amino acid sequences for the H1H15 subtypes of A virus HAs from the databank, respectively, and made a multiple alignment for a total of 172 sequences by CLUSTAL W. After removing all alignment gaps, 540 amino acid sites were used for estimating gamma distances with a = 1.20, which was obtained by Gu and Zhang's method. An NJ tree was constructed, and the branch lengths were recalculated by the ordinary least squares method (Rzhetsky and Nei 1993
) to estimate the rate of amino acid substitution accurately (see subsequently).
When the years of isolation are available for viral sequences in a phylogenetic tree, the rate of amino acid substitution may be estimated by the regression coefficient of the numbers of amino acid substitutions from a common root on the years of isolation (Nei 1983
; Suzuki, Wyndham, and Gojobori 2001
). Using the phylogenetic tree for 172 sequences of influenza A virus HAs, we estimated the rate of amino acid substitution for duck A virus HAs because duck provided the largest number (28) of sequences among aquatic birds. For estimating the divergence times between subtypes of A virus HA genes, we constructed a linearized tree (Takezaki, Rzhetsky, and Nei 1995
) for 28 amino acid sequences of duck A virus HAs using the gamma distance with a = 1.20. The standard errors (SEs) and 99% confidence intervals (CIs) of the rates and the divergence times were estimated by the bootstrap method, under the assumption that the topologies of the phylogenetic trees for 172 sequences of influenza A virus HAs and 28 sequences of duck A virus HAs were correct (Nei and Kumar 2000
).
| Results |
|---|
|
|
|---|
Phylogenetic Relationships of Influenza A, B, and C Virus HA Genes
The NJ trees constructed by using p, PC, and gamma distances for 17 randomly chosen amino acid sequences of the HA2 (HE2) protein are shown in panels (a), (b), and (c) of figure 2 , respectively. All trees show the same topology and indicate that all influenza A virus HA genes diverged after they separated from B virus HA genes. The monophyly of A virus HA genes is supported by a bootstrap value of 100%, 99%, and 95% in trees (a), (b), and (c), respectively. This relationship was also supported by the NJ trees for 296 amino acid sequences of influenza A, B, and C virus HA2s (HE2s) with high bootstrap values (100%, 99%, and 86% for p, PC, and gamma distances, respectively) (data not shown).
|
Divergence Times Between Subtypes of A Virus HA Genes
For estimating the divergence times between subtypes of A virus HA genes, we first estimated the rate of amino acid substitution for duck A virus HAs because duck is one of the natural reservoirs of these viruses and provided the largest number of sequences among them. In the phylogenetic tree for 172 amino acid sequences of the entire region of A virus HAs, only the H1 and H2 subtypes included sufficient numbers of sequences for estimating the rate for duck A virus HAs and are shown in panels (a) and (b) of figure 3 , respectively. In this figure, avian sequences had generally shorter branch lengths than human and swine sequences in both subtypes, indicating that the rate for the former was slower than that for the latter. To estimate the rate of amino acid substitution, we conducted a regression analysis using duck sequences but failed to obtain the rate because it became negative in both H1 and H2 subtypes (data not shown). This happened probably because the evolutionary rate for duck sequences was too slow to give reliable estimates (Bean et al. 1992
|
To obtain a reliable rate for duck sequences, it was necessary to analyze duck sequences which were more distantly related from one another than those analyzed earlier in the article. For this purpose, we estimated the years of divergences between duck sequences and human and swine sequences in figure 3 , using the rates for the latter sequences and added these nodes to the regression analysis of duck sequences. The rates for human and swine sequences were easily estimated by the regression analysis using these sequences only because the rates were relatively high. In the H1 subtype, we first estimated the year of divergence at node M using the rate for human sequences (fig. 3a ). We used only human sequences which were isolated before 1977 because human A viruses circulating after 1977 are known to have originated from a laboratory-adapted virus (Kendal et al. 1978
|
|
|
A linearized tree for 28 amino acid sequences of duck A virus HAs is shown in figure 5 . The topology of subtypes was the same as that shown in figure 1 , except for the branching pattern of the H11 subtype, which was supposed to make a cluster with the H12 subtype but made a cluster with the H1, H2, and H5 subtypes. The estimates of the divergence times between different subtypes of influenza A virus HA genes are listed in table 4 . Although the SE and 99% CI are large, all subtypes apparently diverged from several thousand to several hundred years ago. In particular, the earliest divergence (node X) is likely to have occurred about 2,000 years ago. We further estimated the divergence times between influenza A, B, and C virus HA (HE) genes by linearizing the phylogenetic tree in figure 2c. Assuming that the earliest divergence between subtypes of A virus HA genes occurred 1,971 years ago (table 4 ), A and B virus HA genes apparently diverged 3,832 years ago, and the separation of A and B virus HA genes from C virus HE genes occurred 7,919 years ago.
|
|
| Discussion |
|---|
|
|
|---|
The divergence between influenza A and B virus HA genes apparently occurred earlier than the divergences between different subtypes of A virus HA genes. This is different from the conclusion of Webster et al. (1992)
The rate of amino acid substitution for duck A virus HAs (3.19 x 10-4 per site per year) was slower than that for human and swine A virus HAs ([0.562.03] x 10-3 per site per year) but similar to that for B virus HAs (5.3 x 10-4 per site per year [Air et al. 1990
]) and C virus HEs (2.3 x 10-4 per site per year [Muraki et al. 1996
]). These results suggest that the rate for HAs (HEs) is more or less constant in the natural reservoir but is accelerated in the newly infected host species. This is probably caused by variation in the strengths of immune responses and functional constraints on HAs (HEs) among different host species (Yamashita et al. 1988
; Bean et al. 1992
; Schafer et al. 1993
; Scholtissek, Ludwig, and Fitch 1993
; Makarova et al. 1999
; Suzuki and Gojobori 1999
).
The earliest divergence time between subtypes of influenza A virus HA genes was estimated to be about 2,000 years ago. Also, the divergence time between A and B virus HA genes was estimated to be about 4,000 years ago, whereas A and B virus HA genes and C virus HE genes diverged about 8,000 years ago. These estimates are substantially higher than those (200300 years) by Saitou and Nei (1986)
, who used human HA sequences. Because the evolutionary rate for human A virus HAs is known to be higher than that for aquatic birds, their estimates are considered to be underestimates. In fact, influenza pandemics in humans have been recorded as early as 412 B.C. (Kaplan and Webster 1977
), suggesting that influenza A viruses existed more than 2,400 years ago. This observation is consistent with the estimates obtained in the present study.
We estimated the rates and the divergence times under the assumption that the molecular clock has held throughout the evolutionary history of HA (HE) genes. To examine whether this was really the case, we tested the linear relationship between the year of isolation and the number of amino acid substitutions in figure 4 and found that the linearity was not supported at the 1% significance level in both panels (a) and (f). However, the rate of amino acid substitution for human A virus HAs obtained from panel (a) (1.20 x 10-3 per site per year) was similar to that from previous studies (1.0 x 10-3 per site per year [Saitou and Nei 1986
]), and the rate for duck A virus HAs obtained from panel (f) (3.89 x 10-4 per site per year) was similar to that obtained from panel (e) (2.48 x 10-4 per site per year). These observations suggest that the rates obtained from panels (a) and (f) are approximately correct. Also, the molecular clock was not rejected at the 1% significance level for the phylogenetic tree in figure 5
by the likelihood-ratio test (Rambaut 2000
; Yang 2000
) but was rejected for the tree in figure 2c.
The latter observation may reflect the fact that the biochemical functions are different between HAs and HEs and the natural reservoirs are not the same for influenza A, B, and C viruses. Therefore, some caution is necessary in estimating the divergence times between influenza A, B, and C virus HA (HE) genes. However, the rate of amino acid substitution for duck influenza A virus HAs was similar to that for B virus HAs and C virus HEs, as indicated previously. Also, in reality, no strict molecular clock is likely to hold for any protein but it is known that rough divergence times can be obtained even if the molecular clock is violated to some extent (Nei and Kumar 2000
, pp. 187206; Nei, Xu, and Glazko 2001
). Therefore, these estimates also appear to be appropriate as rough estimates.
In conclusion, influenza virus HA (HE) genes apparently evolved at a rate of amino acid substitution of 10-4 per site per year in the natural reservoir. These genes apparently diverged into influenza A, B, and C virus HA (HE) genes several thousand of years ago and subsequently into subtypes in influenza A viruses from several thousand to several hundred years ago.
| Acknowledgements |
|---|
|
|
|---|
The authors thank two anonymous reviewers for their valuable comments. This study was supported by grants from the National Institutes of Health to M.N. (GM20293). Y.S. is supported by the JSPS Research Fellowships for Young Scientists.
| Footnotes |
|---|
Naruya Saitou, Reviewing Editor
Keywords: influenza virus
hemagglutinin
hemagglutinin-esterase
rate of amino acid substitution
divergence time ![]()
Address for correspondence and reprints: Yoshiyuki Suzuki, Institute of Molecular Evolutionary Genetics, The Pennsylvania State University, 328 Mueller Laboratory, University Park, Pennsylvania 16802. yis1{at}psu.edu ![]()
| References |
|---|
|
|
|---|
Air G. M., A. J. Gibbs, W. G. Laver, R. G. Webster, 1990 Evolutionary changes in influenza B are not primarily governed by antibody selection Proc. Natl. Acad. Sci. USA 87:3884-3888
Bean W. J., M. Schell, J. Katz, Y. Kawaoka, C. Naeve, O. Gorman, R. G. Webster, 1992 Evolution of the H3 influenza virus hemagglutinin from human and nonhuman hosts J. Virol 66:1129-1138
Cox N. J.,, F. Fuller, N. Kaverin, H. D. Klenk, R. A. Lamb, B. W. J. Mahy, J. McCauley, K. Nakamura, P. Palese, R. Webster, 2000 Family Orthomyxoviridae Pp. 585597 in M. H. V. van Regenmortel, C. M. Fauquet, D. H. L. Bishop, et al. (11 co-editors), eds. Virus Taxonomy. Academic Press, London
Felsenstein J., 1985 Confidence limits on phylogenies: an approach using the bootstrap Evolution 39:783-791[Web of Science]
Gammelin M., A. Altmuller, U. Reinhardt, J. Mandler, V. R. Harley, P. J. Hudson, W. M. Fitch, C. Scholtissek, 1990 Phylogenetic analysis of nucleoproteins suggests that human influenza A viruses emerged from a 19th-century avian ancestor Mol. Biol. Evol 7:194-200[Web of Science][Medline]
Gu X., J. Zhang, 1997 A simple method for estimating the parameter of substitution rate variation among sites Mol. Biol. Evol 14:1106-1113[Abstract]
Hayashida H., H. Toh, R. Kikuno, T. Miyata, 1985 Evolution of influenza virus genes Mol. Biol. Evol 2:289-303[Abstract]
Hinshaw V. S., R. G. Webster, B. Turner, 1980 The perpetuation of orthomyxoviruses and paramyxoviruses in Canadian waterfowl Can. J. Microbiol 26:622-629[Web of Science][Medline]
Kaplan M. M., R. G. Webster, 1977 The epidemiology of influenza Sci. Am 12:88-106
Kendal A. P., G. R. Noble, J. J. Skehel, W. R. Dowdle, 1978 Antigenic similarity of influenza A (H1N1) viruses from epidemics in 19771978 to "Scandinavian" strains isolated in epidemics of 19501951 Virology 89:632-636[Web of Science][Medline]
Krossoy B., I. Hordvik, F. Nilsen, A. Nylund, C. Endresen, 1999 The putative polymerase sequence of infectious salmon anemia virus suggests a new genus within the Orthomyxoviridae J. Virol 73:2136-2142
Kumar S., K. Tamura, I. B. Jakobsen, M. Nei, 2001 MEGA2: molecular evolutionary genetics analysis software Bioinformatics 17:12441245
Makarova N. V., N. V. Kaverin, S. Krauss, D. Senne, R. G. Webster, 1999 Transmission of Eurasian avian H2 influenza virus to shorebirds in North America J. Gen. Virol 80:3167-3171
Muraki Y., S. Hongo, K. Sugawara, F. Kitame, K. Nakamura, 1996 Evolution of the haemagglutinin-esterase gene of influenza C virus J. Gen. Virol 77:673-679
Nakada S., R. S. Creager, M. Krystal, R. P. Aaronson, P. Palese, 1984 Influenza C virus hemagglutinin: comparison with influenza A and B virus hemagglutinins J. Virol 50:118-124
Nakajima K., U. Desselberger, P. Palese, 1978 Recent human influenza A (H1N1) viruses are closely related genetically to strains isolated in 1950 Nature 274:334-339[Medline]
Nei M., 1983 Genetic polymorphism and the role of mutation in evolution Pp. 165190 in M. Nei and R. K. Koehn, eds. Evolution of genes and proteins. Sinauer, Sunderland, Mass
Nei M., S. Kumar, 2000 Molecular evolution and phylogenetics Oxford University Press, Oxford, New York
Nei M., P. Xu, G. Glazko, 2001 Estimation of divergence times from multiprotein sequences for a few mammalian species and several distantly related organisms Proc. Natl. Acad. Sci. USA 98:2497-2502
Palese P., J. F. Young, 1982 Variation of influenza A, B, and C viruses Science 215:1468-1474
Rambaut A., 2000 Estimating the rate of molecular evolution: incorporating noncontemporaneous sequences into maximum likelihood phylogenies Bioinformatics 16:395-399
Reid A. H., T. G. Fanning, J. V. Hultin, J. K. Taubenberger, 1999 Origin and evolution of the 1918 "Spanish" influenza hemagglutinin gene Proc. Natl. Acad. Sci. USA 96:1651-1656
Rohm C., N. Zhou, J. Suss, J. Mackenzie, R. G. Webster, 1996 Characterization of a novel influenza hemagglutinin, H15: criteria for determination of influenza A subtypes Virology 217:508-516[Web of Science][Medline]
Rzhetsky A., M. Nei, 1993 Theoretical foundation of the minimum-evolution method of phylogenetic inference Mol. Biol. Evol 10:1073-1095[Abstract]
Saitou N., M. Nei, 1986 Polymorphism and evolution of influenza A virus genes Mol. Biol. Evol. 3:57-74[Abstract]
. 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees Mol. Biol. Evol 4:406-425[Abstract]
Schafer J. R., Y. Kawaoka, W. J. Bean, J. Suss, D. Senne, R. G. Webster, 1993 Origin of the pandemic 1957 H2 influenza A virus and the persistence of its possible progenitors in the avian reservoir Virology 194:781-788[Web of Science][Medline]
Scholtissek C., S. Ludwig, W. M. Fitch, 1993 Analysis of influenza A virus nucleoproteins for the assessment of molecular genetic mechanisms leading to new phylogenetic virus lineages Arch. Virol 131:237-250[Web of Science][Medline]
Scholtissek C. V., V. von Hoyningen, R. Rott, 1978 Genetic relatedness between the new 1977 epidemic strains (H1N1) of influenza and human influenza strains isolated between 1947 and 1957 (H1N1) Virology 89:613-617[Web of Science][Medline]
Slemons R. D., D. C. Johnson, J. S. Osborn, F. Hayes, 1974 Type-A influenza viruses isolated from wild free-flying ducks in California Avian Dis 18:119-124[Web of Science][Medline]
Smith W., C. H. Andrewes, P. P. Laidlaw, 1933 A virus obtained from influenza patients Lancet 225:66-68
Suarez D. L., 2000 Evolution of avian influenza viruses Vet. Microbiol 74:15-27[Web of Science][Medline]
Suzuki Y., T. Gojobori, 1999 A method for detecting positive selection at single amino acid sites Mol. Biol. Evol 16:1315-1328[Abstract]
Suzuki Y., A. Wyndham, T. Gojobori, 2001 Virus evolution Pp. 377413 in D. J. Balding, M. Bishop, and C. Cannings, eds. Handbook of statistical genetics. Wiley, Chichester
Takezaki N., A. Rzhetsky, M. Nei, 1995 Phylogenetic test of the molecular clock and linearized trees Mol. Biol. Evol 12:823-833[Abstract]
Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680
Webster R. G., W. J. Bean, O. T. Gorman, T. M. Chambers, Y. Kawaoka, 1992 Evolution and ecology of influenza A viruses Microbiol. Rev 56:152-179
Webster R. G., M. Yakhno, V. S. Hinshaw, W. J. Bean, K. G. Murti, 1978 Intestinal influenza: replication and characterization of influenza viruses in ducks Virology 84:268-278[Web of Science][Medline]
WHO Memorandum. 1980 A revision of the system of nomenclature for influenza viruses Bull. WHO 58:585-591[Web of Science][Medline]
Yamashita M., M. Krystal, W. M. Fitch, P. Palese, 1988 Influenza B virus evolution: co-circulating lineages and comparison of evolutionary pattern with those of influenza A and C viruses Virology 163:112-122[Web of Science][Medline]
Yang Z., 2000 Phylogenetic analysis by maximum likelihood (PAML). Version 3.0 University College London, London, U.K
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
Q. Wang, F. Cheng, M. Lu, X. Tian, and J. Ma Crystal Structure of Unliganded Influenza B Virus Hemagglutinin J. Virol., March 15, 2008; 82(6): 3011 - 3020. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Chen and E. C. Holmes Avian Influenza Virus Exhibits Rapid Evolutionary Dynamics Mol. Biol. Evol., December 1, 2006; 23(12): 2336 - 2341. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Suzuki Natural Selection on the Influenza Virus Genome Mol. Biol. Evol., October 1, 2006; 23(10): 1902 - 1911. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Banyai, V. Martella, F. Jakab, B. Melegh, and G. Szucs Sequencing and Phylogenetic Analysis of Human Genotype P[6] Rotavirus Strains Detected in Hungary Provides Evidence for Genetic Heterogeneity within the P[6] VP4 Gene J. Clin. Microbiol., September 1, 2004; 42(9): 4338 - 4343. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. J. Guyatt, J. Twin, P. Davis, E. C. Holmes, G. A. Smith, I. L. Smith, J. S. Mackenzie, and P. L. Young A molecular epidemiological study of Australian bat lyssavirus J. Gen. Virol., January 1, 2003; 84(2): 485 - 496. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








