Molecular Biology and Evolution, Vol 9, 666-677, Copyright © 1992 by Society for Molecular Biology and Evolution
GA Watterson
This paper analyzes the nucleotide sequences of three viruses: Kunjin, west
Nile, and yellow fever. Each virus has one long open reading frame of
greater than 10,200 nucleotides that codes for four structural and seven
nonstructural genes. The Kunjin and west Nile viruses are the most closely
related pair, when assessed on the basis of matches between their
nucleotide sequences. As would be expected, the matching is least for bases
at third-position codon sites and is greatest for second-position sites.
Statistics are presented for the numbers of mismatches that are transitions
or transversions. Nucleotide base usage is also reported. To each of the 33
virus-gene segments, nonhomogeneous Markov chain models have been fitted to
describe the sequences of nucleotide bases. The models allow for different
transition probabilities ("transition" is used in the mathematical sense
here) and for different degrees of dependency, at the three sites in the
codons. Reasonably satisfactory fits can be obtained for many of the genes
by using models that are first order for both first- and second-position
sites in the codon but that are second order for third-position sites. One
consequence of such a model is that the correlation between one amino acid
and the next is limited to the correlation of the last base of the former
with the first base of the latter. Other consequences are that the model
can (and does) prohibit the occurrence of stop codons within a gene and
that subsequences of only first-position bases, or only third-position
bases, are also first-order Markov chains. In theory, second-position
subsequences may not be Markov chains at all. In practice, the data suggest
that each of these subsequences is effectively a zero-order Markov chain,
i.e., bases spaced three apart are statistically independent. Stationarity
of nucleotide base distributions can be interpreted in either of two ways:
(1) spatially along the sites or (2) temporally at each site. These
interpretations must often be inconsistent, when the former allows for
Markov dependence between adjacent sites whereas the latter assumes
independence between sites. The inconsistency can be overcome, for these
viruses, if subsequences at different codon positions are analyzed
separately.
ORIGINAL ARTICLE
A stochastic analysis of three viral sequences
Department of Mathematics, Monash University, Victoria, Australia.
![]()
CiteULike
Connotea
Del.icio.us What's this?