MBE Advance Access originally published online on June 8, 2009
Molecular Biology and Evolution 2009 26(9):2047-2059; doi:10.1093/molbev/msp113
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Genomic Features That Predict Allelic Imbalance in Humans Suggest Patterns of Constraint on Gene Expression Variation



,
,
,||,1
,¶,1
* Department of Biology, Duke University, Durham, NC
Institute for Genome Sciences & Policy, Duke University, Durham, NC
Department of Statistical Science, Duke University, Durham, NC
Department of Computer Science, Duke University, Durham, NC
|| Department of Biostatistics and Bioinformatics, Duke University, Durham, NC
¶ Department of Evolutionary Anthropology, Duke University, Durham, NC
E-mail: jt5{at}duke.edu.
Accepted for publication May 26, 2009.
Variation in gene expression is an important contributor to phenotypic diversity within and between species. Although this variation often has a genetic component, identification of the genetic variants driving this relationship remains challenging. In particular, measurements of gene expression usually do not reveal whether the genetic basis for any observed variation lies in cis or in trans to the gene, a distinction that has direct relevance to the physical location of the underlying genetic variant, and which may also impact its evolutionary trajectory. Allelic imbalance measurements identify cis-acting genetic effects by assaying the relative contribution of the two alleles of a cis-regulatory region to gene expression within individuals. Identification of patterns that predict commonly imbalanced genes could therefore serve as a useful tool and also shed light on the evolution of cis-regulatory variation itself. Here, we show that sequence motifs, polymorphism levels, and divergence levels around a gene can be used to predict commonly imbalanced genes in a human data set. Reduction of this feature set to four factors revealed that only one factor significantly differentiated between commonly imbalanced and nonimbalanced genes. We demonstrate that these results are consistent between the original data set and a second published data set in humans obtained using different technical and statistical methods. Finally, we show that variation in the single allelic imbalance-associated factor is partially explained by the density of genes in the region of a target gene (allelic imbalance is less probable for genes in gene-dense regions), and, to a lesser extent, the evenness of expression of the gene across tissues and the magnitude of negative selection on putative regulatory regions of the gene. These results suggest that the genomic distribution of functional cis-regulatory variants in the human genome is nonrandom, perhaps due to local differences in evolutionary constraint.
Key Words: allelic imbalance cis-regulatory variation genetic variation support vector machine
1 These authors contributed equally to this work.