MBE Advance Access published online on June 8, 2009
Molecular Biology and Evolution, doi:10.1093/molbev/msp113
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Genomic features that predict allelic imbalance in humans suggest patterns of constraint on gene expression variation


1 Department of Biology, Duke University, Durham, NC 27708, USA
2 Institute for Genome Sciences & Policy, Durham, NC 27708, USA
3 Department of Statistical Science, Durham, NC 27708, USA
4 Department of Computer Science, Durham, NC 27708, USA
5 Department of Biostatistics and Bioinformatics, Durham, NC 27708, USA
6 Department of Evolutionary Anthropology, Durham, NC 27708, USA
* Author and address for correspondence: Jenny Tung, Box 90338, Durham, NC 27708, Fax: (919) 660-7293; Phone: (919) 668-6249, jt5{at}duke.edu
Received for publication January 26, 2009. Revision received May 20, 2009. Accepted for publication May 26, 2009.
Variation in gene expression is an important contributor to phenotypic diversity within and between species. Although this variation often has a genetic component, identification of the genetic variants driving this relationship remains challenging. In particular, measurements of gene expression usually do not reveal whether the genetic basis for any observed variation lies in cis or in trans to the gene, a distinction that has direct relevance to the physical location of the underlying genetic variant, and which may also impact its evolutionary trajectory. Allelic imbalance measurements identify cis-acting genetic effects by assaying the relative contribution of the two alleles of a cis-regulatory region to gene expression within individuals. Identification of patterns that predict commonly imbalanced genes could therefore serve as a useful tool and also shed light on the evolution of cis-regulatory variation itself. Here, we show that sequence motifs, polymorphism levels, and divergence levels around a gene can be used to predict commonly imbalanced genes in a human dataset. Reduction of this feature set to four factors revealed that only one factor significantly differentiated between commonly imbalanced and non-imbalanced genes. We demonstrate that these results are consistent between the original dataset and a second published dataset in humans obtained using different technical and statistical methods. Finally, we show that variation in the single allelic imbalance-associated factor is partially explained by the density of genes in the region of a target gene (allelic imbalance is less probable for genes in gene-dense regions), and, to a lesser extent, the evenness of expression of the gene across tissues and the magnitude of negative selection on putative regulatory regions of the gene. These results suggest that the genomic distribution of functional cis-regulatory variants in the human genome is nonrandom, perhaps due to local differences in evolutionary constraint.
Key Words: allelic imbalance cis-regulatory variation genetic variation support vector machine
These authors contributed equally to this work.