In Arabidopsis thaliana, gene expression level polymorphisms (ELPs) between natural accessions that exhibit simple, single locus inheritance are promising quantitative trait locus (QTL) candidates to explain phenotypic variability. It is assumed that such ELPs overwhelmingly represent regulatory element polymorphisms. However, comprehensive genome‐wide analyses linking expression level, regulatory sequence and gene structure variation are missing, preventing definite verification of this assumption. Here, we analyzed ELPs observed between the Eil‐0 and Lc‐0 accessions. Compared with non‐variable controls, 5′ regulatory sequence variation in the corresponding genes is indeed increased. However, ∼42% of all the ELP genes also carry major transcription unit deletions in one parent as revealed by genome tiling arrays, representing a >4‐fold enrichment over controls. Within the subset of ELPs with simple inheritance, this proportion is even higher and deletions are generally more severe. Similar results were obtained from analyses of the Bay‐0 and Sha accessions, using alternative technical approaches. Collectively, our results suggest that drastic structural changes are a major cause for ELPs with simple inheritance, corroborating experimentally observed indel preponderance in cloned Arabidopsis QTL.
Microarray technologies have had a major impact on quantitative genetic analyses, enabling, for instance, large‐scale discovery of genetically controlled gene expression level differences. This has led to the identification of numerous expression quantitative trait loci (eQTL) in various organisms (Brem et al, 2002; Morley et al, 2004; Doss et al, 2005; Li et al, 2006; West et al, 2007; Stranger et al, 2007b; Potokina et al, 2008). In this study, we focused on expression level polymorphisms (ELPs) that are observed between parents and display simple, single locus inheritance. Such loci constitute a highly heritable subset of cis‐acting eQTL (Doss et al, 2005; Petretto et al, 2006; West et al, 2006, 2007; Keurentjes et al, 2007; Stranger et al, 2007b; Potokina et al, 2008) and are strong quantitative trait locus (QTL) candidates to explain phenotypic variation between parental lines. Despite the abundance of ELPs with simple inheritance, little is known about their molecular basis. Generally, however, they are assumed to reflect cis‐acting sequence polymorphisms in regulatory elements of the corresponding genes (Jansen and Nap, 2001; Cowles et al, 2002; Schadt et al, 2003; Pastinen and Hudson, 2004; Ronald et al, 2005; Williams et al, 2007), although few studies have addressed this issue systematically (Cowles et al, 2002; GuhaThakurta et al, 2006).
Counter to the idea that regulatory polymorphisms are major determinants of phenotypic variability, QTL cloning in Arabidopsis thaliana has mostly identified knockout mutations as the underlying molecular cause (e.g. Aukerman et al, 1997; Grant et al, 1995; Johanson et al, 2000; Kliebenstein et al, 2001; Kroymann et al, 2003; Kroymann et al, 2001; Mouchel et al, 2004; Werner et al, 2005). Even if many of these loci represent ELPs, generally, a preponderance of indels is observed among these drastic mutations (Koornneef et al, 2004). However, because structural changes are easier to discover, successful reports of QTL isolation might reflect a technical bias towards knockout alleles. Indeed, studies of recombinant inbred line (RIL) populations created from Arabidopsis accessions have identified numerous eQTL (Keurentjes et al, 2007; West et al, 2007), including a sizable fraction of loci representing parental ELPs with simple inheritance. In this study, we analyzed the molecular basis of such ELPs in greater detail, by comprehensive comparison of gene expression, sequence variation and gene structure.
We identified parental ELPs in 480 genes through microarray analyses of the Arabidopsis accessions Eil‐0 and Lc‐0, a number comparable with studies of other accessions (West et al, 2006; Keurentjes et al, 2007). Among them, ELPs with simple inheritance were determined by comparison between single nucleotide polymorphism (SNP)‐genotyped RILs derived from the two accessions and their two parents in microarray analyses. Approximately 20% of parental ELPs displayed simple cis‐inheritance; taking into account technical limitations (such as arbitrary thresholds for differential expression), this group is very likely to be even bigger (∼44%).
To determine whether these ELPs with simple inheritance are associated with increased regulatory sequence variation in the corresponding genes, we compared their promoter sequences with control genes that displayed very low variability across all microarray experiments. Indeed, sequence diversity between Eil‐0 and Lc‐0 was considerably higher in the ELP sample, supporting the regulatory hypothesis. However, parallel analyses of genomic Eil‐0 and Lc‐0 DNA using Arabidopsis tiling arrays, empirically calibrated for indel detection using the sequence data (Figure 3), revealed numerous indels of various sizes in both Eil‐0 and Lc‐0. Such indels were particularly abundant in genes representing ELPs with simple inheritance and, unlike in controls, were generally larger and frequently affected exons. Thus, uni‐parental indels that likely impair or even abolish gene function appear to be much more frequent in genes representing ELPs with simple inheritance than in controls. In the vast majority of cases, the allele that carried deletions was expressed at lower level, consistent with the idea that the majority of deletions negatively affect gene function and lead to loss of selection on gene maintenance and consequently expression. Supporting this notion, parental ELPs that carried indels in their coding region also displayed increased regulatory sequence variation. Notably, the observation that alleles carrying deletions were expressed at lower levels could simply reflect decreased hybridization signal if the deletion overlapped with the probe. However, except for a minority of loci with large deletions, this was generally not evident from the expression array analyses.
To independently corroborate our findings, we analyzed reported ELPs with simple inheritance that were found between the Arabidopsis Bay‐0 and Sha accessions (West et al, 2006, 2007), using a different microarray platform and a different conceptual approach to extract heritable ELPs. Again, in genome tiling arrays, we observed a strong preponderance of indels in the ELPs with simple inheritance (>6‐fold enrichment) as compared with controls. Similar to our results for Eil‐0 and Lc‐0, their majority occurred at the level of exons (∼33% of loci) or genes (18%).
These findings were corroborated by a recently developed algorithm (Zeller et al, 2008) that identified reduced or absent hybridization signal over extended tracts in an oligonucleotide array‐based re‐sequencing effort of the Bay‐0 and Sha genomes. Matching our tiling array analysis, in nearly all cases, stretches of reduced hybridization matched the presence of deletions as detected by the tiling array approach (Figure 6).
In summary, our data suggest that ELPs with simple inheritance in Arabidopsis primarily reflect the consequences of structural differences in the corresponding genes, rather than variation in regulatory elements, even if such a variation is observed. Thus, although functional variation in cis‐regulatory elements contributes clearly to phenotypic variation (Bentsink et al, 2006; Rus et al, 2006; Sibout et al, 2008), large‐effect changes that impact the integrity of transcribed regions should be considered as an equally valid explanation for expression variation. Indeed, the prevalence of indels in ELPs with simple inheritance mirrors the preponderance of indels with drastic effect on gene integrity underlying cloned QTL, suggesting that the latter do not reflect a technical bias in the ease of detection. Thus, Arabidopsis QTL representing more subtle regulatory polymorphisms might be less common than anticipated.
Heritable gene expression level polymorphisms (ELPs) between natural strains are strong candidates for quantitative trait loci (QTL) that could explain intra‐specific phenotypic variation.
Here we test the assumption that ELPs with simple, single locus inheritance primarily represent sequence variation in the corresponding regulatory sequences in Arabidopsis thaliana, through comprehensive genome‐wide analyses linking variation in expression level, regulatory sequence and gene structure.
We find that a large fraction of genes representing ELPs with simple inheritance carry uni‐parental indels that likely impair gene function. Thus, these ELPs primarily appear to reflect the consequences of structural differences in the corresponding genes, rather than variation in regulatory elements, even if such variation is observed.
Our results are in line with the experimentally observed preponderance of indels with drastic effects on gene integrity in cloned Arabidopsis QTL, suggesting that they do not reflect a technical bias and that Arabidopsis QTL representing more subtle regulatory polymorphisms might be less common than anticipated.
Recent advances in high throughput technologies have had a major impact on quantitative genetic analyses, enabling the interrogation of whole genomes for characteristics such as, gene expression levels, single nucleotide polymorphisms (SNPs) or structural genome variation (Keurentjes et al, 2008). Among these approaches, microarray‐based discovery of genetically controlled gene expression level differences has identified numerous expression quantitative trait loci (eQTL) in humans and model organisms (Brem et al, 2002; Morley et al, 2004; Doss et al, 2005; Li et al, 2006; West et al, 2007; Stranger et al, 2007b; Potokina et al, 2008). eQTL can be divided principally into two classes (Gibson and Weir, 2005; Rockman and Kruglyak, 2006; Hansen et al, 2008). Trans‐acting eQTL (trans‐eQTL) control the expression of other loci, whereas cis‐acting eQTL (cis‐eQTL) coincide with the loci whose expression varies. The latter represent ∼20–50% of eQTL in various systems (Morley et al, 2004; Li et al, 2006; Stranger et al, 2007b; Potokina et al, 2008) and, additively, often explain significant portions of observed phenotypic variability (Li et al, 2006; Petretto et al, 2006; Keurentjes et al, 2007; Wentzell et al, 2007; Stranger et al, 2007b).
In this study, we focused on expression level polymorphisms (ELPs) that are already observed between parental lines and display simple, single locus inheritance. Such loci constitute a highly heritable subset of cis‐eQTL and, because of their simple inheritance, can be exploited as markers (Doss et al, 2005; Petretto et al, 2006; West et al, 2006, 2007; Keurentjes et al, 2007; Stranger et al, 2007b; Potokina et al, 2008). They can, for instance, replace SNPs in genotyping, a particularly interesting application in systems with poorly characterized genomes (West et al, 2006; Potokina et al, 2008). Despite the abundance of ELPs with simple inheritance, little is known about their molecular basis. In principle, they could represent trans‐eQTL that are tightly linked to the locus they control, and this scenario might account for a significant fraction of heritable ELPs in large, complex genomes that are difficult to analyze at high resolution. Generally, however, it appears more likely that ELPs with simple inheritance represent large effect cis‐acting polymorphisms in individual genes (Ronald et al, 2005; Stranger et al, 2007b; Hansen et al, 2008). These could include polymorphisms that affect gene expression at the transcriptional, post‐transcriptional or post‐translational level. For instance, mutations might alter transcript stability, or activity of the encoded protein, which could in turn affect RNA levels in cases of auto‐regulatory feedback. Generally, however, cis‐eQTL and thus ELPs with simple inheritance are assumed to reflect sequence variation in regulatory elements of the corresponding genes (Jansen and Nap, 2001; Cowles et al, 2002; Schadt et al, 2003; Pastinen and Hudson, 2004; Ronald et al, 2005; Williams et al, 2007), although only few studies have addressed this issue systematically (Cowles et al, 2002; GuhaThakurta et al, 2006).
Somewhat counter to the idea that regulatory polymorphisms are major determinants of phenotypic variability, in Arabidopsis thaliana, quantitative trait locus (QTL) cloning over the last years has often identified knockout mutations that affect the transcript and/or protein as the underlying molecular cause (e.g. Aukerman et al, 1997; Grant et al, 1995; Johanson et al, 2000; Kliebenstein et al, 2001; Kroymann et al, 2003; Kroymann et al, 2001; Mouchel et al, 2004; Werner et al, 2005). Even if many of these loci represent ELPs, generally, a preponderance of indels, whether in regulatory or transcript regions, is observed among these drastic mutations (Koornneef et al, 2004). However, because of the considerable sequence polymorphisms distinguishing naturally occurring isogenic Arabidopsis strains (so‐called accessions), identification of the precise change underlying a QTL is often difficult, and structural changes are the easiest to discover. Thus, the successful reports of QTL isolation might reflect a bias in the ease with which such changes are detected. Indeed, recent studies that exploited recombinant inbred line (RIL) populations created from Arabidopsis accessions have identified numerous eQTL by microarray analyses (Keurentjes et al, 2007; West et al, 2007), including a varying portion of cis‐eQTL. Among the cis‐eQTL, a sizable fraction of loci represented parental ELPs with simple inheritance, which are strong QTL candidates to explain morphological or physiological variation between the parental lines. In this study, we analyzed the molecular basis of such ELPs in greater detail, by comprehensive comparison of gene expression, sequence variation and gene structure. Corroborating the experimental evidence from published reports of QTL cloning, we found again a preponderance of sizable indels, suggesting that QTL representing more subtle regulatory polymorphisms might be less common than anticipated.
Results and discussion
Expression level polymorphisms between the Eil‐0 and Lc‐0 accessions
To identify parental ELPs, we determined transcript level variation between Eil‐0 and Lc‐0 seedlings by microarray analyses. Arrays based on short oligonucleotide probes are particularly sensitive to SNPs in parental transcripts, resulting in spurious eQTL and overestimation of cis‐eQTL (Doss et al, 2005; Alberts et al, 2007), although this appears to depend on various factors, such as array design (Luo et al, 2007). In the absence of detailed genomic sequence information on the Eil‐0 and Lc‐0 accessions, in this study, we used arrays based on gene‐specific probes of 150–500 bp lengths (Allemeersch et al, 2005). Genomic DNA hybridizations have previously shown that such two‐color arrays are largely insensitive to potential hybridization efficiency biases introduced by minor sequence polymorphisms (Keurentjes et al, 2007). Moreover, they also offer the advantage of direct sample comparison, allowing immediate ELP assessment rather than ELP inference from statistical comparison of single sample hybridizations of oligonucleotide‐based arrays (West et al, 2006). Nevertheless, in this study, we chose to follow established recommendations for the analysis of two‐color arrays, which includes a statistical component (Shi et al, 2006). Based on duplicate dye swap comparison of three independent RNA samples, 499 ELPs (P<0.005 with Benjamini–Hochberg false discovery rate multiple testing correction and fold change ⩾2) representing 480 genes distributed across the genome were observed (Supplementary Table 1). Comparable numbers of parental ELPs have been found for other pairs of Arabidopsis accessions (West et al, 2006; Keurentjes et al, 2007).
Determination of expression level polymorphisms with simple inheritance by microarray analysis of recombinant inbred lines
To determine which of these ELPs show simple inheritance over several generations, we took advantage of a RIL population that had been derived by single‐seed descent over seven generations, starting with F2 individuals from an Eil‐0 (♀) × Lc‐0 (♂) cross (Sibout et al, 2008). Notably, it was evident from earlier studies that detection of parental ELPs with simple inheritance does not require full‐scale eQTL analysis of RIL populations, as they represent a subgroup of cis‐eQTL that display firm allele‐dependent inheritance of differential expression through all generations starting from the parents. Thus, dye swap comparisons between a few RILs and their two parents in microarray analyses were sufficient for their detection (Figure 1A). The RILs were chosen to represent the genetic diversity of the population based on genotyping data from 79 segregating genome‐wide SNP markers (Warthmann et al, 2007; Sibout et al, 2008) (Supplementary Table 2), such that each locus would be derived typically from the same parent in at least three RILs. Thus, seven RILs were chosen for detailed analyses. RNA from these lines was hybridized against RNA from either parent in a dye swap layout. To assess the heritability of parental ELPs, we compared expected and observed differential expression, taking into account the genotyping data (Figure 1A). On average, ∼60% of predicted ELPs were recovered in a given RIL versus parent hybridization (Figure 1B, Supplementary Table 3), similar to proportions found in other studies (Keurentjes et al, 2007). The absence of differential expression was an even better predictor, matching ∼91% of observations. This discrepancy is likely due to the fact that the 2‐fold change in expression represents a rather stringent but also arbitrary selection criterion. Overall, predictions of presence and absence of differential expression matched better, if data were treated according to 5% false discovery rate. However, as an extensive study of two‐color microarray hybridizations recommended a 2‐fold change in conjunction with false discovery rate for scoring differential signals (Shi et al, 2006), we used the analysis of our data according to those criteria, as the baseline in the following. Similar to earlier studies (West et al, 2006; Keurentjes et al, 2007), the parental ELPs could be used for RIL genotyping, delivering higher resolution than the SNP data (Supplementary Figure 1).
Comparison of the patterns of individual genes corresponding to parental ELPs, across all RIL hybridizations, enabled us to classify them according to the frequency at which predictions were met. This analysis identified a group representing ∼20% of parental ELPs that perfectly matched predictions (Figure 1C, Supplementary Table 4) and, thus, can be considered to have simple cis‐inheritance. Notably, as many of the other loci frequently missed our cutoff criteria for differential expression only narrowly, particularly the 2‐fold criterion (see above), this is a conservative estimate. Overall, ELPs whose hybridization pattern matched 80% of predictions or more represented ∼44% of all parental ELPs.
Sequence analysis of regulatory regions in genes representing ELPs with simple inheritance
To determine whether ELPs with simple inheritance are associated with increased sequence variation in regulatory regions, as observed in other systems (Cowles et al, 2002; GuhaThakurta et al, 2006), we compared a sample of 61 genes chosen from the ELP group that matched at least 90% of predictions with a control group of 85 genes that displayed very low variability and differential expression across all microarray experiments (see Methods). Notably, in Arabidopsis, regulatory elements controlling gene expression are generally found in the 5′ vicinity of the transcription start sites and the 5′ leader sequences (Lee et al, 2006). Thus, we isolated 1 kb fragments immediately upstream of the start codon for each of the sample and control group genes from both Eil‐0 and Lc‐0. Sequence information was obtained for ∼44 kb of stably heritable ELP loci and ∼62 kb of control loci (Supplementary Table 5, Supplementary sequence alignments). Sequence diversity between Eil‐0 and Lc‐0 was considerably higher in the ELPs with simple inheritance as compared with the control group (Figure 2A, Supplementary Table 5). Overall, SNP frequency was increased >4.5‐fold, indel number >4.7‐fold and the number of bp affected by indels >9.0‐fold (Figure 2B and C). Generally, SNPs were biased towards the promoter as compared with the leader sequences. These results support the idea that ELPs with simple inheritance are associated with increased sequence diversity in the regulatory regions of the corresponding genes.
Genome tiling array analyses of the Eil‐0 and Lc‐0 genomes
Analyses of Arabidopsis genome variation have discovered unexpectedly high levels of accession‐specific indels, which often impair gene function (Clark et al, 2007; Zeller et al, 2008). Such indels can, for instance, be identified by probing whole genome tiling arrays with genomic DNA (Hinds et al, 2006; Clark et al, 2007; Yazaki et al, 2007). As we failed to amplify the 5′ regions of at least one parent for ∼34% of all loci initially targeted for sequencing in the ELP group and ∼12% in the control group, we sought to determine whether this could be explained by indels. To this end, duplicate samples of genomic Col‐0, Eil‐0 and Lc‐0 DNA were hybridized to Affymetrix Arabidopsis tiling 1.0R arrays, which represent the Col‐0 genome as a tile of 25mer oligonucleotides with 10 bp spacing. Thresholds for detection of deletions (⩾2.8‐fold drop in hybridization signal over ⩾35 bp, maximum allowed gap 150 bp) in Eil‐0 and Lc‐0 were determined empirically. This was done using deletions identified in the sequencing data (Figure 3). These threshold criteria consistently allowed detection of indels greater than ∼30 bp, whereas at the same time ruling out the possibility that deletion calls could represent spurious differential signals because of SNPs or smaller indels (Figure 3) as detected in other studies (Li et al, 2006; West et al, 2006; Alberts et al, 2007; Borevitz et al, 2007; Clark et al, 2007). The genes representing parental ELPs as well as the control genes were inspected individually and only indels that were detected consistently in both replicate hybridizations were considered real. Even using these stringent criteria, numerous indels of various sizes were identified in both Eil‐0 and Lc‐0 (e.g. Figure 4; bar files for viewing tiling paths are provided in the Supplementary information). However, although ∼42% of all parental ELP genes displayed indels when comparing their structure in Eil‐0 versus Lc‐0, only 9% of control genes did (Figure 5A; Supplementary Table 6); thus representing a >4‐fold enrichment. Moreover, in the control group, deletions were usually small and affected mostly intron or leader sequences. As it appeared possible that the low expression variability of the control group genes could reflect the effect of purifying selection, we also analyzed a non‐redundant random set of genes, which yielded essentially quantitatively similar results (Supplementary Table 6). By contrast, in the ELP group, generally multiple indels per gene were detected, and these were often larger and frequently affected exons. Moreover, in the ELP group, gene deletions (defined as uninterrupted deletion detection signal spanning >50% of the transcript region) were observed for nearly 10% of loci. Gene deletions were never observed in either control group.
The majority of genes representing ELPs with simple inheritance display uni‐parental structural changes
Analysis of deletions according to ELP class with respect to matched predictions revealed a clear trend towards more severe indel types in ELP loci with simple inheritance. For instance, in the class of ELPs that perfectly matched predictions, 20% of loci displayed uni‐parental gene deletions, whereas 25% of loci carried deletions in exons (Figure 5B). Still within the group of ELPs that matched at least 80% of predictions, the majority of loci displayed major uni‐parental deletions. By contrast, the proportion of loci, for which no structural difference was observed between Eil‐0 and Lc‐0, continuously increased in the parental ELP classes that matched predictions less and less faithfully, accompanied by a decrease in the severity of deletions observed. Thus, indels that are likely to impair or even abolish gene function appear to be much more frequent in genes representing ELPs, with simple inheritance, than in genes representing less heritable parental ELPs or invariable (or random) controls. These data suggest that the majority of ELPs with simple inheritance reflect a uni‐parental impairment or even loss of gene function.
Importantly, in the vast majority of cases, over 90%, deletions were in phase with the direction of expression difference between the parents, such that the allele that carried deletions was expressed at a lower level. This observation would be consistent with the idea that the majority of deletions negatively affect gene function, thus leading to a loss of selection on gene maintenance and consequently gene expression. Supporting this notion, those parental ELPs that carried indels in their coding region also displayed a higher level of sequence variation in their 5′ regulatory regions (Figure 5C). However, the observation that alleles carrying deletions were expressed at lower levels could also simply reflect a difference in hybridization signal because of deletions in one allele. Although this appears likely for loci that displayed uni‐parental gene deletion, this explanation might not be generally applicable to loci that carried partial deletions. Such loci might still yield detectable although potentially aberrant transcripts, even if those would not encode functional protein. In fact, deletions were not evident from our expression arrays, as documented by the signal strength distribution of parental ELPs, which resembles the one for all genes (Figure 5D–E). Moreover, as background noise is difficult to define in the two‐color array hybridizations employed in our study, absence of hybridization signal is hard to establish, in particular for genes that are expressed at low levels (Czechowski et al, 2004). Finally, an earlier study used two‐color arrays as well, and the authors entertained the notion that ELPs might reflect deletions (Keurentjes et al, 2007). To test this, they hybridized their arrays with competing genomic DNA from the parental accessions, Ler and Cvi‐0, to identify a total of 159 indels. Of those, 14 coincided with cis‐eQTL that mostly reflected ELPs with simple inheritance observed between the parents. However, as their study identified 922 parental ELPs, this would mean that there are either significantly fewer structural differences between the Ler and Cvi‐0 genomes than between the Eil‐0 and Lc‐0 genomes, or that indels were underestimated as compared with our study.
Independent analysis of ELPs with simple inheritance between Bay‐0 and Sha
To corroborate independently the validity of our approach, we made use of other studies, in which expression differences between the Arabidopsis Bay‐0 and Sha accessions were reported (West et al, 2006, 2007). Importantly, these data were extracted from full‐scale single‐feature polymorphism and e‐QTL analysis of a population of more than 200 RILs, which, compared with our Eil‐0xLc‐0 analysis, was characterized using a different, short oligonucleotide microarray platform (Affymetrix) and a different conceptual approach to extract heritable gene expression differences. Thus, 187 genes representing parental ELPs with simple inheritance between Bay‐0 and Sha were identified. We performed two independent hybridizations of genomic DNA of both Bay‐0 and Sha to genome tiling arrays and analyzed the data as outlined above for Eil‐0 and Lc‐0. Again, we observed a strong preponderance of indels in the 187 ELPs with simple inheritance (>6‐fold enrichment) as compared with the same control group used above (the two gene sets did not overlap; importantly, the control genes had been selected from the Eil‐0xLc‐0 analysis according to the indicated threshold criteria, but also according to the fact that they were monitored in all hybridizations, and that they were not part of gene expression markers in the Bay‐0 × Sha eQTL analysis). (Figure 6A, Supplementary Table 7). Similar to our results for Eil‐0 and Lc‐0, the majority of the deletions in the ELPs with simple inheritance were observed at the level of exons (∼33% of loci) or genes (18%) (Figure 6B).
The Bay‐0 and Sha accessions were also part of a recent genome re‐sequencing effort using high‐density oligonucleotide arrays that interrogate SNP polymorphisms at every single base of the Arabidopsis genome (Clark et al, 2007). These data offered us the opportunity to independently verify our results. To this end, we analyzed the ELPs with simple inheritance and control group genes by a recently developed algorithm (Zeller et al, 2008) to identify polymorphic region predictions (PRPs), i.e. reduced hybridization signal over extended tracts of sequence. Such PRPs could result from an accumulation of SNPs or indels. Matching our tiling array analysis, PRPs were dramatically more frequent and generally more extended in the ELPs with simple inheritance as compared with the controls (Supplementary Table 7). This is illustrated by comparison of the combined PRP lengths in the Bay‐0 versus the Sha alleles, which also revealed a marked asymmetry in PRP size in the genes representing ELPs with simple inheritance (Figure 6C), but not in the control genes (Figure 6D). In nearly all cases, increased PRP size matched the presence of deletions as detected by the tiling array approach.
In summary, our data suggest that ELPs with simple inheritance in Arabidopsis primarily reflect the consequences of structural differences in the corresponding genes, rather than variation in regulatory elements, even if such a variation is observed. Notably, association of increased SNP variability and proximal deletions has also been observed in the human genome (Hinds et al, 2006). The large majority of deletions detected in ELPs with simple inheritance affected open reading frames or even complete genes, suggesting that they could frequently lead to loss of gene function. Moreover, we repeatedly observed major deletions of flanking regulatory regions. Even if those deletions leave transcription units intact, they might lead to reduced or abolished gene expression, resulting in a de facto loss of gene function.
It remains to be determined whether Arabidopsis suffers from a particularly heavy mutational load because of inbreeding, as suggested before (Bustamante et al, 2002), or whether our findings apply more broadly. The similarity in ELP behavior across systems and the finding that copy number variation can explain significant portions of quantitative traits (Cutler et al, 2007; Stranger et al, 2007a) suggests that this could be the case. Finally, although functional variation in cis‐regulatory elements contributes clearly to phenotypic variation (Bentsink et al, 2006; Rus et al, 2006; Sibout et al, 2008), large‐effect changes that impact the integrity of transcribed regions should be considered as an equally valid explanation for expression variation. Indeed, such mutations have been shown to underlie phenotypic variation in natural strains of Arabidopsis (Grant et al, 1995; Aukerman et al, 1997; Johanson et al, 2000; Kliebenstein et al, 2001; Kroymann et al, 2001, 2003; Koornneef et al, 2004; Werner et al, 2005). Finally, the prevalence of indels in ELPs with simple inheritance mirrors the preponderance of indels with a drastic effect on gene integrity underlying cloned QTL, suggesting that the latter do not reflect a technical bias in the ease of detection. Thus, Arabidopsis QTL representing more subtle regulatory polymorphisms might be less common than anticipated.
Materials and methods
Seeds of Arabidopsis accessions were obtained from the Arabidopsis Biological Resources Center (Ohio State University, USA). Sterilized seeds were stratified for 48 h at 4°C, and seedlings were germinated and grown in tissue culture on a basic solid medium with macro and micronutrients (0.5 × MS) and 0.9% agar (Duchefa, the Netherlands), supplemented with 2% sucrose at 21°C under continuous light of 130 μE intensity. The Eil‐0 × Lc‐0 RIL population was derived from a cross between those parents in which Eil‐0 served as the mother, after seven generations of single‐seed descent starting from the segregating F2 generation (Sibout et al, 2008). Plant material for RNA analysis was harvested at 9 days after germination, typically from pools of 20 seedlings per line.
Genomic DNA of the EL RIL F5 population was isolated with Plant DNeasy™ kits (QIAGEN, the Netherlands) according to the manufacturer's instructions. Genotyping with a set of 289 SNPs was carried out by Genaissance Pharmaceuticals, Inc. (New Haven, CT, USA). Of those SNP, 79 were polymorphic between Eil‐0 and Lc‐0.
Total RNA was isolated using the Plant RNeasy™ kit (QIAGEN, the Netherlands) according to the manufacturer's instructions. Total RNA from the seedling pools was amplified using the MessageAmp™ aRNA II kit (Ambion, TX, USA). Five micrograms of amplified RNA were reverse transcribed into cyanin 3‐ or cyanin 5‐labeled cDNA, purified with Qiaquick™ columns (Qiagen, the Netherlands) and hybridized on microarrays produced by the Lausanne DNA Array Facility (GEO accession number GPL6147) containing 25 000 gene‐specific tags for the A. thaliana genome (Hilson et al, 2004). In order to analyze ELPs between the accessions Eil‐0 and Lc‐0, three independently grown seedling pools were analyzed by two‐color co‐hybridization of the labeled cDNAs in dye swap experiments, giving a total of six slides. These experiments can be found in the GEO database under entry GSE13628.
Statistical analyses of gene expression measures were carried out with open source R software packages available as part of the BioConductor project (http://www.bioconductor.org). Raw data from the microarrays were normalized by print tip lowess normalization (Yang et al, 2002), without applying background subtraction. To identify differentially expressed genes, we computed single gene moderated t‐statistics (Smyth, 2004) using the limma package (Smyth, 2005). Genes were ranked according to their mod‐t P‐value and a cutoff was set at a maximum false discovery rate (Benjamini–Hochberg multiple testing correction, (Benjamini and Hochberg, 1995) of 0.005. From these genes, those with a minimum 2‐fold expression difference between Eil‐0 and Lc‐0 qualified as parental ELP. For the analysis of RIL gene expression, each RIL sample was co‐hybridized with each parent (Eil‐0 and Lc‐0) in a dye swap, resulting in two slides per parent versus RIL comparison. Genes with large mod‐t and an expression difference of at least 2‐fold in the RIL‐parent comparison were considered as expressed differentially. In order to select genes that show no ELP (control genes), we selected genes, which had a maximum fold change of 1.3 in at least 13 out of 17 conditions tested (all RIL versus parent comparisons and Eil‐0 versus Lc‐0 comparisons). These genes were ranked according to their signal intensities and genes with an A‐value <8 were excluded. From the remaining medium to high‐intensity genes (134), 97 were selected for promoter and tiling array analysis.
For sequence analyses of regulatory elements, 1 kb fragments of 85 control genes and 65 stable ELP genes spanning the region 5′ to the start codon were isolated by PCR with KOD Hot Start Polymerase® (Novagen™) following the manufacturer's instructions. PCR‐amplified fragments were purified using QiaQuick columns (Qiagen, the Netherlands) and sequenced by Macrogen Inc. (Republic of South Korea). Obtained sequences were analyzed using MacVector™ 7.2.2 software. The sequences have been submitted to the GenBank database (accession numbers FJ441298‐FJ441589).
Genomic DNA was extracted from pools of three plants for each accession (Col‐0, Eil‐0, Lc‐0 Bay‐0, Sha) with Plant DNeasy™ kits (QIAGEN, the Netherlands) according to the manufacturer's instructions. Biotin‐labeled target DNA was generated from this genomic DNA as described (Borevitz, 2006). Labeled targets were hybridized on Affymetrix GeneChip® Arabidopsis Tiling 1.0R Arrays and processed according to the supplier's protocols. CEL files were processed by Affymetrix tiling analysis software to generate normalized signal bar files. Tiling analysis software settings were quantile normalization and a bandwidth for probe analysis of 50 bp. To determine structural variations in the genomes of Eil‐0, Lc‐0, Bay‐0 and Sha, two independent DNA isolates of each accession were compared with Columbia DNA. The resulting bar files were loaded into the Affymetrix integrated genome browser software and analyzed manually for the genes of interest. To qualify as deletions, the integrated genome browser signals had to be below cutoff—1.5 (log2 scale) and the settings for min run was >35 and for max gap⩽150. These parameters were determined empirically (see text and Figure 3). The TAIR Arabidopsis genome annotation version 7.0 was used for analysis. The tiling array raw data have been deposited at the ArrayExpress database under accession number E‐MEXP‐1888.
We would like to thank Dr K Osmont for helpful comments on the manuscript, O Hagenbüchle and A Paillusson for Affymetrix tiling array hybridizations and E Farmer and P Reymond for the PCR products used to make the custom‐spotted DNA microarrays. Contributions: CSH, KH, SP and JW conceived this study and analyzed the data together with DRG and GZ. CSH wrote the manuscript with help from KH, SP, JW, DRG and DW. Recombinant inbred lines were contributed by CSH and SP. All molecular biology experiments except microarray hybridizations were performed by SP. Microarray hybridizations were performed by CN and JT. Statistical analyses of microarray experiments were performed by DRG, JW and GZ. DRG was funded by the Swiss National Science Foundation National Centre for Competence in Research (Plant Survival). This work was supported by the University of Lausanne, by Swiss National Science Foundation Grant 3100A0‐107631 to CSH and by the SystemsX ‘Plant growth in a changing environment’ project funding for CSH.
Conflict of Interest
The authors declare that they have no conflict of interest.
Supplementary Figure 1 [msb200879-sup-0001.jpg]
Supplementary Table 1 [msb200879-sup-0002.xls]
Supplementary Table 2 [msb200879-sup-0003.xls]
Supplementary Table 3 [msb200879-sup-0004.xls]
Supplementary Table 4 [msb200879-sup-0005.xls]
Supplementary Table 5 [msb200879-sup-0006.xls]
Supplementary Table 6 [msb200879-sup-0007.xls]
Supplementary Table 7 [msb200879-sup-0008.xls]
Supplementary sequence alignments [msb200879-sup-0009.pdf]
Supplementary Materials and methods
Normalized signal bar files generated by Affymetrix tiling analysis software (TAS) [msb200879-sup-0010.zip]
This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
- Copyright © 2009 EMBO and Nature Publishing Group