MBE Advance Access originally published online on April 18, 2008
Molecular Biology and Evolution 2008 25(7):1512-1520; doi:10.1093/molbev/msn098
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Difficulties in Testing for Covarion-Like Properties of Sequences under the Confounding Influence of Changing Proportions of Variable Sites


* Institute of Botany III, University of Düsseldorf, Düsseldorf, Germany
Institute for Molecular BioSciences, Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
Biomathematics Research Centre, Allan Wilson Centre for Molecular Ecology and Evolution, University of Canterbury, Christchurch, New Zealand
E-mail: nicole.gruenheit{at}uni-duesseldorf.de.
| Abstract |
|---|
|
|
|---|
The covarion (COV)-like properties of sequences are poorly described and their impact on phylogenetic analyses poorly understood. We demonstrate using simulations that, under an evolutionary model where the proportion of variable sites changes in nonadjacent lineages, log likelihood values for rates across site (RAS) and COV models become similar, making models difficult to distinguish. Further, although COV and RAS models provide a great improvement in likelihood scores over a homogeneous model with these simulated data, reconstruction accuracy of tree building is low, suggesting caution when it is suspected that proportions of variable sites differ in different evolutionary lineages. We study the performance of a recently developed contingency test that detects the presence of COV-type evolution modified for protein data. We report that if proportions of variable sites (pvar) change in a lineage-specific manner such that their distributions in different lineages become sufficiently nonoverlapping, then the contingency test can incorrectly suggest a homogeneous model. Also of concern is the possibility of different proportions of variable sites between the groups being studied. In a study of chloroplast proteins, interpretation of the test is found to be susceptible to different partitioning of taxon groups, making the test very subjective in its implementation. Extreme intergroup differences in the extent of divergence and difference in proportions of variable sites could be contributing to this effect.
Key Words: covarion phylogenetics chloroplast proteins contingency test
| Introduction |
|---|
|
|
|---|
Although sequence evolution is a temporally and spatially heterogeneous process, sequence evolution is typically described by a homogenous, stationary, time reversible model (Liò and Goldman 1998
More recently, a number of covarion (COV) (Fitch and Markowitz 1970
) models have been implemented for phylogenetic analyses (Galtier 2001
; Huelsenbeck 2002
; Guindon et al. 2004
; Wang et al. 2007
), and these COV models have been found to provide further improvement over RAS models in terms of the relative fit to sequence data. This is presumably because these models capture a component of temporal heterogeneity in the evolutionary process—that is, unlike RAS models, they allow the substitution properties of a site to change over a time in a lineage-specific fashion. Under COV models, a site is free to switch back and forth between variable and invariable states along a branch.
In the COV model of Tuffley and Steel (1998)
, a site in a sequence may be either variable or invariable, and the state may differ in different lineages. All sites that are variable, evolve under the same substitution process (e.g., JC69, HKY85, etc.) and at the same rate. The COV model of Huelsenbeck (2002)
extends the Tuffley and Steel model by allowing there to be a discrete number of rate classes for the variable state. Under this model, a site can switch between the OFF state and one of the variable rate classes but not between the different variable rate classes. A third COV model is that of Galtier (2001)
. In this model, there is a discrete number of rate classes for the variable state. A site can switch between these rate classes. Under this model, there is no OFF state. Most recently, Wang et al. (2007)
have combined these 2 latter models and produced a general model (one in which there can be a switch between all variable states and an OFF state).
All these COV models are stationary time reversible models and have an expectation that the proportion of variable sites is the same in all evolutionary lineages. However, this assumption can be overly restrictive as proportions of variable sites, pvar, have been inferred to vary in lineage-specific ways (Lockhart et al. 1996
, 2006
; Lopez et al. 2002
). This property of sequence evolution can lead to topological biases that will mislead tree building (Lockhart et al. 1996
, 2006
). With some proteins, changes in pvar can be explained by lineage-specific differences in functional and structural constraints, due to differential loss/gain of functions ancillary to the core function of specific molecules (Susko et al. 2002
; Inagaki et al. 2004
; Guo and Stiller 2005
).
Improving substitution models for phylogenetic analysis requires accurate tests to quantify the extent and nature of substitution model misspecification. A number of tests have been proposed to characterize COV-like substitution properties. However, as we illustrate using simulated and real data, interpretation from these tests need to be made cautiously, particularly when pvar is not constant across the underlying phylogeny. Our findings highlight the need for improved analytical methods for studying the COV-like properties of sequences.
| Materials and Methods |
|---|
|
|
|---|
Maximum Likelihood Analyses
Within a maximum likelihood framework, log likelihood scores can be used to evaluate the relative fit of COV, RAS, and homogeneous models to sequence data. The best scores can then be used to identify the best substitution model for tree building. To examine the accuracy of this approach under conditions that might approximate the biological complexity expected with empirical data, we have examined the scores obtained when sequences are simulated under what we call a Tuffley and Steel (1998)
|
|
However, in the special case of convergent increase in variable sites in nonadjacent lineages, where at the 2 switch positions 1) the only change in class is from invariable to TS sites and 2) the sites that change class are the same, the phylogenetic mixture reduces from 8 to just 3 classes of TS model. The mixture allowed us to specify the proportion of sites belonging to the invariable class and to specify the proportion of sites that switch from this class to the TS class in the nonadjacent lineages. Sequences 10,000 nt in length were simulated with Seq-Gen-Aminocov (Rambaut and Grassly 1997
In our study for each of the increments, 100 replicates were generated, and each simulated alignment was analyzed using Procov1.3 (Wang et al. 2007
). The following models were compared using the standard optimization files without reestimation of the branch lengths: homogeneous, Tuffley and Steel (1998)
, RAS (Yang 1994
), Galtier (2001)
, General (Wang et al. 2007
), and Huelsenbeck (2002)
. For each alignment and model, the log likelihood was extracted using a Perl script. The mean for each parameter was calculated and plotted using matlab, the standard deviations for each set of 100 replicates were very narrow (<0.1% of the mean in all cases) and hence were not plotted.
Trees were reconstructed for the simulated data sets using Paup* (Swofford 2003
; maximum likelihood: lset nst = 1 basefreq = equal; lset tratio = 0.5 pinv = 0 rates = gamma shape = estimate; hsearch start = stepwise swap = tbr status = no nbest = 1; parsimony: hsearch start = stepwise swap = tbr status = no nbest = 1; parsimony: hsearch start = stepwise swap = tbr status = no nbest = 1) and MrBayes (Ronquist and Huelsenbeck 2003
; lset nst = 1 covarion = yes; mcmc nruns = 1 ngen = 250000 samplefreq = 100 filename = run1.nex; sumt burnin = 400).
Contingency Tests
Another approach to test whether a collection of sites in a multiple sequence alignment exhibit COV-type evolutionary properties is the contingency test developed by Lockhart et al. (1998)
, which is based on the test statistic W. This compares substitution differences between 2 groups of sequences
|
|
N3 or N4 (syn. type 3 or type 4) sites should be less frequent if sequences are evolving according to a RAS model than if the sites are evolving in a manner that approximates a COV model (Lockhart et al. 1998
). If in real data there are more N3 and N4 sites than expected to occur by chance under a RAS model, this would constitute evidence for deviation from the assumptions of a RAS model and possibly evidence for a COV modus of sequence evolution (Lockhart et al. 1998
).
Ané et al. (2005)
improved upon this test by providing a more rigorous means for obtaining expectations for the test statistic W under 3 different models of evolution 1) a homogeneous model, wherein different sequence positions are equally variable; 2) a RAS model, wherein some sites are evolving faster than other sites; and 3) a Tuffley and Steel (1998)
COV model. In doing this, they noted that W predicts that sites that are varied in one group are likely to be varied in other groups under RAS and COV models but not under a homogeneous model. A RAS model predicts a strong degree of correlation and a COV model a weaker degree of correlation. Under a homogenous model, the W statistic is statistically zero. It is positive under a COV model and even more positive under a RAS model. The Ané et al. test uses simulation to interpret values of W in terms of support for each of the 3 models.
It does this by first examining whether there is evidence to reject a homogeneous model of sequence evolution in favor of a heterogeneous model. If so, it then examines whether there is evidence to reject a RAS model in favor of a more complex model of substitution. That is, if the derived W differs significantly from the expected distribution of the W under a RAS model, the nucleotide or protein sequence is inferred to have evolved under a RAS + COV model. Ané et al. (2005)
used this test to infer that a large proportion of proteins encoded in chloroplast genomes evolve according to a RAS + COV model.
We have implemented the method of Ané et al. (2005)
for analyzing protein sequences and used it to reexamine chloroplast genome sequences also studied by Ané et al. The sequences used are from Acorus calamus (NC_007407
[GenBank]
), Adiantum capillus-veneris (NC_004766
[GenBank]
), Amborella trichopoda (NC_005086
[GenBank]
), Anthoceros formosae (NC_004543
[GenBank]
), Arabidopsis thaliana (NC_000932
[GenBank]
), Atropa belladonna (NC_004561
[GenBank]
), Calycanthus floridus (NC_004993
[GenBank]
), Chaetosphaeridium globosum (NC_004115
[GenBank]
), Chlamydomonas reinhardtii (NC_005353
[GenBank]
), Chlorella vulgaris (NC_001865
[GenBank]
), Cyanidioschyzon merolae (NC_004799
[GenBank]
), Cyanophora paradoxa (NC_001675
[GenBank]
), Epifagus virginiana (NC_001568
[GenBank]
), Ginkgo biloba (DQ069337
[GenBank]
–DQ069702
[GenBank]
), Guillardia theta (NC_000926
[GenBank]
), Lotus corniculatus (NC_002694
[GenBank]
), Marchantia polymorpha (NC_001319
[GenBank]
), Medicago truncatula (NC_003119
[GenBank]
), Mesostigma viride (NC_002186
[GenBank]
), Nephroselmis olivacea (NC_000927
[GenBank]
), Nicotiana tabacum (NC_001879
[GenBank]
), Nuphar advena (DQ069337
[GenBank]
–DQ069702
[GenBank]
), Nymphaea alba (NC_006050
[GenBank]
), Odontella sinensis (NC_001713
[GenBank]
), Oenothera elata (NC_002693
[GenBank]
), Oryza sativa (NC_001320
[GenBank]
), Physcomitrella patens (NC_005087
[GenBank]
), Pinus koraiensis (NC_004677
[GenBank]
), Pinus thunbergii (NC_001631
[GenBank]
), Porphyra purpurea (NC_000925
[GenBank]
), Psilotum nudum (NC_003386
[GenBank]
), Ranunculus macranthus (DQ069337
[GenBank]
–DQ069702
[GenBank]
), Saccharum officinarum (NC_006084
[GenBank]
), Spinacia oleracea (NC_002202
[GenBank]
), Triticum aestivum (NC_002762
[GenBank]
), Typha latifolia (DQ069337
[GenBank]
–DQ069702
[GenBank]
), Yucca schidigera (DQ069337
[GenBank]
–DQ069702
[GenBank]
), and Zea mays (NC_001666
[GenBank]
).
Sequences were aligned using ClustalW (Thompson et al. 1994
), and all gapped sites were removed. To obtain a phylogenetic overview of the data set, sequences were concatenated, LogDet distances were computed with LDDist (Lake 1994
; Lockhart et al. 1994
; Thollesson 2004
) from which phylogenetic networks were constructed with Neighbor-Net as implemented in splitstree 4 (Huson and Bryant 2006
). A Java program was written to count the different types of sites and is available upon request. For each alignment, the user gets the numbers of type 1, 2, 3, 4, and 5 sites. Sites with gaps have been ignored.
| Results |
|---|
|
|
|---|
Maximum Likelihood Analyses
We have investigated the extent to which time reversible substitution models describe the evolution of sequences that have evolved under a Tuffley and Steel (1998)
|
Contingency Tests
Characterization of the COV-like properties of sequence data can also be made using contingency tests. The test of Ané et al. overcomes problems of interpreting the W statistic with real data that were not solved by Lockhart et al. (1998)
|
|
|
A striking feature of the aligned sequence data are the different proportions of N3 and N4 sites among different groups of sequences. In comparisons of a monophyletic versus paraphyletic group, the number of N3 sites greatly exceeds the number of N4 sites. All proteins had at least 20% N3 sites, and in 16% of the proteins >70% of all sites were N3, whereas no protein had an N4 site (fig. 4a). In some proteins, more than 80% of all sites were N3 or N4 sites. In the comparison of 2 monophyletic groups, far fewer N3 and N4 sites were found and a considerable greater balance between the numbers of N3 and N4 sites was observed (fig. 4b). Most proteins had <5% of either N3 or N4 sites, the maximum number of N3 or N4 sites lies between 40% and 50%. Ané et al. (2005)
|
A further property of the test statistic W also suggests caution in its application. This is that W can become negative (or close to 0) when distributions of variable sites in the groups being compared become sufficiently different, as might happen if the spatial pattern of substitution differs from that expected under time reversible COV models. Thus, unexpected but nevertheless COV-like patterns could lead the W statistic to underestimate the heterogeneity of the substitution process. The expected value w of W can be written as:
|
|
0. However, if the distribution of variable sites has evolved in a more complex manner than envisaged by Tuffley–Steel, then it can be shown that w
0. For example, consider a model where sites fall into 4 classes depending on whether they are variable or invariable in the 2 groups G1, G2, and let- vi = Proportion of variable sites in group Gi,
- v12 = Proportion of variable sites in groups G1 and G2,
i = Probability that a site that is variable in Gi is varied in Gi, and
12 = Probability that a site that is variable in G1 and G2 is varied in G1 and G2.
- v12 = Proportion of variable sites in groups G1 and G2,
Then pi
ivi and p12
12v12, and for a substitution process that is group based (e.g., Jukes and Cantor; Kimura 2P and Kimura 3ST models), we also have
12=
1
2 and w
1
2(v12–v1v2). If the proportion of variable sites increases in G2 whereby the variable sites in G1 are a subset of the variable sites in G2, then the proportion of sites variable in both G1 and G2 will equal the proportion of sites variable in G1, thus v12 = v1 and w
0 because w
1
2(v12–v1v2)=
1
2v1(1–v2)
0. However, if there is little, or in the extreme case, no overlap in the sites that are variable in G1 and G2, then w can take a negative value.
As a simple example, this could entail a hypothetical protein 100 amino acids in length. In G1, the 30 N-terminal sites of this protein become variable but the 70 C-terminal sites remain constant, whereas in G2, the 30 C-terminal sites of X become variable but the 70 N-terminal sites remain constant. In this case, w
0 even though the proportion of variable sites in the 2 groups is similar or the same (v1 = v2) provided that v12<v
(because if v1 = v2, then w
1
2(v12–v1v2)=
1
2(v12–v12)) because w
1
2(v12–v1v2)=
1
2(0–(0.3x0.3))<0.
| Discussion |
|---|
|
|
|---|
For confidence in the reliability of tree building from highly diverged sequences, it is essential to develop low parameter substitution models that capture the heterogeneous complexity of sequence evolution. However, as we have illustrated, current methods need to be applied cautiously in characterizing the evolutionary properties of highly diverged sequences, and our current understanding of sequence evolution is limiting for model development. In this respect, it is important to note that tests of heterotachy, which we have not discussed (e.g., Lopez et al. 1999
A recent development in modeling substitution properties of sequences is to fit a mixture of substitution models to each site in an alignment of sequences (e.g., Pagel and Meade 2004
; Lartillot et al. 2007
). This approach can also be extended to fit a mixture of trees with different branch lengths to the sequences (e.g., Kolaczkowski and Thornton 2004
; Zhou et al. 2007
). There are issues of identifiability with complex mixture and COV models (Allman and Rhodes 2007
), but potentially tests might be developed using such models to better characterize temporal heterogeneity in the evolution of sequences. Such developments will be important because although RAS models have generally improved phylogenetic inference, as we demonstrate here, they are unable to account for lineage-specific patterns of changing pvar. They, and currently implemented COV models, are unable to account for the form of heterotachy that most likely describes the evolution of biological sequences, the further development of mixture models is of interest in this respect.
| Acknowledgements |
|---|
|
|
|---|
We thank Simon Whelan, Andrew Roger, Ed Susko, John Rhodes, Liat Shavit, Simon Joly, Elizabeth Allman, Oliver Deusch, and Tal Dagan for helpful discussions and Microsoft (P.J.L.) and the Julius von Haast Fellowship Fund (W.M.) for research fellowships. This work was funded by the New Zealand Marsden Fund (P.J.L.) and the Deutsche Forschungsgemeinschaft (W.M.).
| Footnotes |
|---|
Andrew Roger, Associate Editor
| References |
|---|
|
|
|---|
Adachi J, Hasegawa M. Improved dating of the human/chimpanzee separation in the mitochondrial DNA tree: heterogeneity among amino acid sites. J Mol Evol (1995) 40:622–628.[CrossRef][Web of Science][Medline]
Allman ES, Rhodes J. The identifiability of tree topology for phylogenetic models. J Comput Biol (2007) 13:1103–1113.
Ané C, Burleigh JG, MacMahon MM, Sanderson MJ. Covarion structure in plastid genome evolution: a new statistical test. Mol Biol Evol (2005) 22:914–924.
Baele G, Raes J, Van de Peer Y, Vansteelandt S. An improved statistical method for detecting heterotachy in nucleotide sequences. Mol Biol Evol (2006) 23:1397–1405.
Bryant D, Moulton V. Neighbor-Net: an agglomerative method for the construction of phylogenetic networks. Mol Biol Evol (2004) 21:255–265.
Felsenstein J. Cases in which parsimony or compatibility methods will be positively misleading, Syst Zool. (1978) 27:401–410.
Fitch WM, Markowitz E. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Genet (1970) 4:579–593.[CrossRef][Web of Science][Medline]
Galtier N. Maximum-likelihood phylogenetic analysis under a covarion-like model. Mol Biol Evol (2001) 18:866–873.
Guo Z, Stiller J. Comparative genomics and evolution of proteins associated with RNA polymerase II C terminal domain. Mol Biol Evol (2005) 22:2166–2178.
Guindon S, Rodrigo AG, Dyer KA, Huelsenbeck JP. Modelling the site specific variation of selection patterns along lineages. Proc Natl Acad Sci USA (2004) 101:12957–12962.
Huelsenbeck JP. Testing a covariotide model of DNA substitution. Mol Biol Evol (2002) 19:698–707.
Huson DH, Bryant D. Application of phylogenetic networks in evolutionary studies. Mol Biol Evol (2006) 23:254–267.
Inagaki Y, Susko E, Fast NM, Roger AJ. Covarion shifts cause a long-branch attraction artifact that unites microsporidia and archaebacteria in EF-1
phylogenies. Mol Biol Evol (2004) 21:1340–1349.
Jukes TH, Cantor CR. Evolution of protein molecules. In: Mammalian protein metabolism—Munro H N, ed. (1969) New York: Academic Press. 21–123.
Kolaczkowski B, Thornton JW. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature (2004) 431:980–984.[CrossRef][Medline]
Lake JA. Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci USA (1994) 91:1455–1459.
Lartillot N, Brinkmann H, Philippe H. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol (2007) 7(Suppl 1):S4.[CrossRef][Medline]
Liò P, Goldman N. Models of molecular evolution and phylogeny. Genome Res (1998) 8:1233–1244.
Lockhart PJ, Larkum AWD, Steel MA, Waddell PJ, Penny D. Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. Proc Natl Acad Sci USA (1996) 93:1930–1934.
Lockhart PJ, Novis P, Milligan BG, Riden J, Rambaut A, Larkum AWD. Heterotachy and tree building: a case study with plastids and eubacteria. Mol Biol Evol (2006) 23:40–45.
Lockhart PJ, Steel M. A tale of two processes. Syst Biol (2005) 54:948–951.
Lockhart PJ, Steel M, Barbrook AC, Huson DH, Charleston MA, Howe CJ. A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol Biol Evol (1998) 15:1183–1188.[Abstract]
Lockhart PJ, Steel M, Hendy M, Penny D. Recovering evolutionary trees under a more realistic model of sequence evolution. Mol Biol Evol (1994) 11:605–612.[Web of Science][Medline]
Lopez P, Casane D, Philippe H. Heterotachy an important process of protein evolution. Mol Biol Evol (2002) 19:1–7.
Lopez P, Forterre P, Philippe H. The root of the tree of life in the light of the covarion model. J Mol Evol (1999) 49:496–508.[CrossRef][Web of Science][Medline]
Misof B, Anderson CL, Buckley TR, Erpenbeck D, Rickert A, Misof K. An empirical analysis of mt 16S rRNA covarion-like evolution of insects: site-specific rate variation is clustered and frequently detected. J Mol Evol (2002) 55:460–469.[CrossRef][Web of Science][Medline]
Pagel M, Meade A. A phylogenetic mixture model for detecting pattern heterogeneity in gene sequence or character-state data. Syst Biol (2004) 53:571–581.
Rambaut A, Grassly NC. Seq-Gen: an application for the Monte-Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci (1997) 13:235–238.
Rodriguez-Ezpelata N, Philippe H, Brinkmann H, Burkhard B, Melkonian M. Phylogenetic analyses of nuclear, mitochondrial and plastid multi-gene datasets support the placement of Mesostigma in the Streptophyta. Mol Biol Evol (2007) 24:723–731.
Ronquist F, Huelsenbeck JP. MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics (2003) 19:1572–1574.
Rzhetsky A, Nei M. Unbiased estimates of the number of nucleotide substitutions when substitution rate varies among different sites. J Mol Evol (1994) 38:295–299.[Web of Science][Medline]
Susko E, Inagaki Y, Field C, Holder ME, Roger AJ. Testing for differences in rates-across-sites distributions in phylogenetic subtrees. Mol Biol Evol (2002) 19:1514–1523.
Swofford DL. PAUP*. Phylogenetic analysis using parsimony (*and other methods), version 4 (2003) Sunderland (MA): Sinauer.
Thollesson M. LDDist: a Perl module for calculating LogDet pair-wise distances for protein and nucleotide sequences. Bioinformatics (2004) 20:416–418.
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res (1994) 22:4673–4680.
Tuffley C, Steel M. Modeling the covarion hypothesis of nucleotide substitution. Math Biosci (1998) 147:63–91.[CrossRef][Web of Science][Medline]
Uzzel T, Corbin KW. Fitting discrete probability distributions to evolutionary events. Science (1971) 172:1089–1096.
von Haeseler A, Schoniger M. Evolution of DNA or amino acid sequences with dependent sites. J Comput Biol (1998) 5:149–164.[Web of Science][Medline]
Waddell PJ, Penny D, Moore T. Hadamard conjugations and modelling sequence evolution with unequal rates across sites. Mol Phylogenet Evol (1997) 8:33–50.[CrossRef][Web of Science][Medline]
Wang H-C, Spencer M, Susko E, Roger AJ. Testing for covarion-like evolution in protein sequences. Mol Biol Evol (2007) 24:294–305.
Yang Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol (1994) 39:306–314.[CrossRef][Web of Science][Medline]
Zhou Y, Rodrigue N, Lartillot N, Philippe H. Evaluation of the models handling heterotachy in phylogenetic inference. BMC Evol Biol (2007) 7:206.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. W. Graham and W. J. D. Iles Different gymnosperm outgroups have (mostly) congruent signal regarding the root of flowering plant phylogeny Am. J. Botany, January 1, 2009; 96(1): 216 - 227. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


12% of invariable sites switching to the Tuffley and Steel site class at points x and y shown in 

