To investigate the extent of genetic stratification in structured microbial communities, we compared the metagenomes of 10 successive layers of a phylogenetically complex hypersaline mat from Guerrero Negro, Mexico. We found pronounced millimeter‐scale genetic gradients that were consistent with the physicochemical profile of the mat. Despite these gradients, all layers displayed near‐identical and acid‐shifted isoelectric point profiles due to a molecular convergence of amino‐acid usage, indicating that hypersalinity enforces an overriding selective pressure on the mat community.
Ecosystems often exhibit distinct gradients. Physicochemical gradients have long been documented, but only recently has environmental shotgun sequencing allowed the associated functional (gene‐based) gradients of an ecosystems biota to be addressed. Macroscale functional gradients have been inferred from oceanic metagenomic data sets, both horizontally (Venter et al, 2004; Johnson et al, 2006; Rusch et al, 2007) and vertically (DeLong et al, 2006). Many structured microbial communities have been shown to produce steep physicochemical gradients on the scale of millimeters (Jorgensen et al, 1979; Schmitt‐Wagner and Brune, 1999; Ludemann et al, 2000; Ley et al, 2006), but associated community‐level functional gradients have not been demonstrated to date.
Here, we investigate a complex, stratified, hypersaline microbial mat from Guerrero Negro, Baja California Sur, Mexico, as a model for fine‐scale functional variation (Ley et al, 2006). The dense, tofu‐like texture of this mat allows intact cross‐sections to be obtained down to ∼1 mm thickness. The mat shows pronounced physicochemical variation both in space and time: oxygen is detected routinely in the top 2 mm during the day (up to 700 μM), and the mat is completely anoxic during the night. The permanently anoxic lower layers are characterized by micromolar levels of sulfide that increase with depth. The mat, dominated by bacteria, was reported to be one of the world's richest and most diverse microbial communities, comprising at least 752 observed species from 42 bacterial phyla, including 15 novel candidate phyla (Ley et al, 2006). As the mat grows in hypersaline waters (∼3 × the salinity of sea water), we were also interested to look for evidence of molecular adaptations to hypersalinity in the mat community.
Results and discussion
To investigate millimeter‐scale genetic and associated functional stratification, we performed a metagenomic analysis of 10 spatially successive layers of the Guerrero Negro mat. Mat core samples were collected during the day (Supplementary Table S1) and upper layers were sectioned at a finer scale (1 mm slices) than the lower layers (4–15 mm slices) to capture variation associated with the steep oxygen gradient in the upper millimeters of the mat (Supplementary Table S2). DNA from each layer was cloned and shotgun‐sequenced using capillary sequencing with an average of ∼13 000 reads per layer. No significant assembly of the reads was possible, even when all data were combined (largest contig was 8.4 kb from a combined assembly). We therefore chose to analyze only the unassembled data (average trimmed (Chou and Holmes, 2001) read length 808 bp) to avoid chimerism that has been reported to be frequent in contigs <10 kb (Mavromatis et al, 2007). Genes were predicted on vector and quality‐trimmed reads with fgenesb (http://www.softberry.com/) using a generic bacterial model, resulting in an average of 13 600 genes per layer (Supplementary Table S2). These data have been deposited in genbank under accession numbers ABPP00000000 through ABPY00000000 and are available through the IMG/M system (Markowitz et al, 2006) (see SOM for access information).
Using both bulk similarity matches and phylogenetic mapping of conserved marker genes (von Mering et al, 2007a), we found strong phylogenetic variation between layers. Cyanobacteria and Alphaproteobacteria were the most abundant lineages in the top two layers (Supplementary Figure S1). Below the upper 2 mm, Proteobacteria, Bacteroidetes, Chloroflexi and Planctomycetes were the most represented phyla, with a notable peak in Bacteroidetes at 3 mm (Supplementary Figure S1). Numerous traces of other bacterial phyla as well as some archaea and eukaryotes were also identified. A large fraction of predicted proteins in layers below 2 mm did not have significant sequence similarity to any protein in public databases, reflecting the high degree of phylum‐level novelty in the mat community (Ley et al, 2006). These metagenome‐based findings are in broad agreement with single‐marker gene surveys of the mat (Spear et al, 2003; Ley et al, 2006).
A rough measure of functional potential per organism can be made by estimating the average effective genome size (Raes et al, 2007). Using this method, we predicted an increased average bacterial genome size at the border of the oxic and anoxic zones (1–2 mm depth): 6 Mb at the border versus 3–3.5 Mb for the rest of the mat (Supplementary Figure S2). This may reflect an increased functional complexity needed for survival in the constantly fluctuating conditions at this depth as was recently observed in the genome of a marine Beggiotoa occupying a similar niche (Mussmann et al, 2007).
To investigate genetic gradients through the mat, we determined the relative abundances of individual gene families and metabolic pathways between mat layers, and compared the mat data with external data sets for reference. Many gene families were highly abundant in the mat despite high overall functional diversity (Supplementary Figure S3) and very low sequence coverage of individual species. Indeed, the mat data set roughly doubled existing inventories for some of the gene families described below (Table I). This implies that multiple species and likely higher‐level taxa contribute representatives of these families, and suggests that there has been a strong selection for a limited number of common functionalities in the mat.
The key aspect of this study was to use the metagenomic data to determine what, if any, millimeter‐scale genetic gradients are detectable in this very complex and structured ecosystem. Several gene families and pathways either directly (Figure 1A) or inversely (Figure 1B) tracked the steep oxygen gradient in the top 2 mm of the mat and sulfide gradient below 2 mm. Genes directly involved in photosynthesis (KEGG map 00195) were statistically over‐represented in the top two layers relative to lower layers. In addition, an uncharacterized protein domain (pfam05685) highly paralogous in phototrophic lineages (most cyanobacterial and some Chloroflexi genomes) showed a steep declining gradient in the top 6 mm (Figure 1A) consistent with dominance of phototrophs in the same region. Chaperones similarly tracked the oxygen gradient when all gene families with chaperone activity are combined together. The over‐representation of chaperones in the top 2 mm relative to the rest of the mat may not be associated with oxygen concentration, but rather with heat stress caused by direct exposure to sunlight.
Gene families and pathways that tracked inversely with oxygen concentration included ferredoxins, trimethylamine methyltransferase (Mttb), sulfatases and sugar degradation pathways (Figure 1B). Ferredoxins and associated proteins show a four‐fold increase from the top layer down to a depth of 4 mm and thereafter are uniformly over‐represented. Two COG families are chiefly responsible for this trend: COG1148 (heterodisulfide reductase, subunit A and related polyferredoxins) and COG2414 (aldehyde:ferredoxin oxidoreductase). The expansion of ferredoxins in the anoxic layers likely reflects the diversification of redox reactions required for anaerobic respiration. Mttb (pfam06253, COG5598) methyltransferase does not become significantly over‐represented until at least 7 mm into the mat (Figure 1B), well below the anoxic boundary. Mttb was initially identified as a protein facilitating the first step of methanogenesis from trimethylamine in Methanosarcinaceae (Paul et al, 2000). However, this gene family is also found in methylotrophic bacteria (e.g. in Rhodobacteraceae and Rhizobiaceae), suggesting a more generalized role in C1 metabolism.
One of the most pronounced inverse gradients is observed for sulfatases (COG3119) that are involved in the hydrolysis of sulfated organic compounds (Figure 1B). As sulfatases can function in the presence of oxygen, the gradient is presumably a reflection of the availability of sulfated compounds in the mat. Although the concentration gradient of sulfated compounds is not known in the mat, they are produced by phototrophs (Kates, 1986) and are widespread in marine environments (Glockner et al, 2003). Sulfatase genes obtained from the mat exhibited extensive sequence divergence, suggesting that a corresponding wide variety of sulfated organic substrates are present in the mat, with the highest concentrations below 2 mm. The over‐representation of this gene family may in part be due to an expansion of sulfatase genes in the genomes of Planctomycetes, suggested to be involved particularly in the hydrolysis of sulfated glycopolymers (Glockner et al, 2003).
Sugar degradation pathways (glycolysis and pentose and uronic acid degradation) show a two‐fold increase with depth through the top 3 mm and maintain high relative representation in the anoxic lower layers (Figure 1B). This suggests that heterotrophic metabolism of sugars, particularly pentoses and uronic acids, is important in the lower layers.
Organisms living at the boundary between the oxic and anoxic zones could potentially accumulate substrates with high reductive potential in the anoxic zone, and then move to the oxic zone to harvest this potential by oxidation (Mussmann et al, 2007). This would require boundary zone organisms to be motile and chemotactic. Indeed, we find that chemotaxis signature genes peak sharply at the oxic–anoxic boundary (Figure 1C). Flagella appear not to be the dominant source of motility in these chemotactic organisms as flagellar genes actually dip in this region (Figure 1C). Chemotactic gliding bacteria have been observed in fresh mat cores (Garcia‐Pichel et al, 1994; Kruschel and Castenholz, 1998) and our molecular data suggest they are most abundant in the boundary zone, bridging the oxic and anoxic layers.
Despite the pronounced phylogenetic and functional gradients in the mat, hypersalinity is a selective pressure common to the whole community. A known adaptation to hypersalinity is enrichment of proteins with acidic amino acids, allowing proteins to function in high cytoplasmic salt concentrations (Soppa, 2006). The resulting acid‐shifted protein isoelectric points have been documented in the genomes of only two lineages, the archaeal class Halobacteria (Kennedy et al, 2001; Soppa, 2006) and the bacterial species Salinibacter ruber (Oren and Mana, 2002; Mongodin et al, 2005), so it is unclear how widespread this mechanism is in halophilic communities.
The average isoelectric points of the mat layer communities are conspicuously acid‐shifted when compared with most bacteria and microbiomes that are non‐halophilic (Figure 2A). We determined this to be due primarily to an enrichment in the acidic amino acid aspartate (Figure 2B). Furthermore, the isoelectric profiles of all 10 layers converge on a common acid‐shifted profile (Figure 3A) despite a significant variation in GC content between layers (Figure 3B), reflecting differing phylogenetic composition. The latter is consistent with aspartate usage being GC‐independent as it can be encoded by both GC‐rich and GC‐poor codons (GAC and GAT, respectively). As each metagenomic read pair is likely derived from different species and no single species dominates the mat community, we conclude that a significant fraction of the community has converged on the enrichment of low isoelectric point proteins.
In summary, this study demonstrates that millimeter‐scale genetic gradients can be readily discerned through a vertical cross‐section of a highly structured and complex microbial community using low sequence coverage. Furthermore, we could directly and inversely correlate many of the genetic gradients to the physicochemical profile of the mat. Microbial biofilms are important in many habitats, including our own bodies (Kroes et al, 1999; Eckburg et al, 2005), and often display physicochemical gradients at millimeter to centimeter scales. However, few biofilms are as robust as microbial mats and methods may need to be adapted to preserve spatial structure (Webster et al, 2006) and allow the relevant fine‐scale genetic gradients to be resolved.
Surprisingly, we found that adaptation to hypersalinity by enriching proteins with acidic amino acids is more widespread than previously appreciated. Although this is the first example of species‐independent molecular convergence in a microbial community, we predict that similar convergence patterns will be observed in other communities adapted to similar or different environmental conditions, such as temperature (Gianese et al, 2001) or pressure (Simonato et al, 2006; Lauro and Bartlett, 2008).
Materials and methods
Mat core samples were collected around 1400 hours from pond 4 near pond 5 at the Exportadora de Sal saltworks, Guerrero Negro, Baja California Sur, Mexico. The salinity of the bulk water above the mat was ∼9% (∼3 × the salinity of sea water). Other metadata for the sample can be found in Supplementary Table S1. Four replicate cores were collected, sectioned into layers with sterile scalpels and DNA extracted, normalized, pooled and sequenced as described in Supplementary information. Metagenome sequence data are available under the following GenBank accession numbers: ABPP00000000, ABPQ00000000, ABPR00000000, ABPS00000000, ABPT00000000, ABPU00000000, ABPV00000000, ABPW00000000, ABPX00000000, ABPY00000000
Community composition analysis was performed using the consensus of (i) best BlastP hits (Altschul et al, 1997) to the IMG/M database (Markowitz et al, 2006) and (ii) phylogenetic mapping of signature genes on a phylogenetic tree (von Mering et al, 2007a). See Supplementary information for details.
Gene‐based functional gradients were calculated as follows: genes were assigned to their COG families (Tatusov et al, 1997) and pfam domains (Bateman et al, 2002) based on rpsBLAST (Altschul et al, 1997). The gradients were examined for possible over‐representation of groups or individual families or domains, and 1000 bootstrap iterations were used to assess the significance of over‐representation. The described gradients were independently confirmed using two databases: IMG/M (Markowitz et al, 2006) and the STRING database (von Mering et al, 2007b). Further details as well as groupings of families/domains are described in Supplementary information.
Isoelectric point distributions, amino‐acid composition and GC content were computed using appropriate perl scripts and modules as described in Supplementary information.
We thank Amber Hartman for fruitful discussions and the Exportadora de Sal saltworks in Guerrero Negro, Baja California Sur, for access and assistance with the field site. We also thank the NASA funded researchers at NASA Ames who assisted with the field work: David Des Marais, Moira Doty, Tori Hoehler, Mary Hogan and Kendra Turk. Sequencing was provided by the JGI Community Sequencing Program. This work was performed under the auspices of the US Department of Energy's Office of Science, Biological and Environmental Research Program, and the University of California, Lawrence Livermore National Laboratory under contract no. W‐7405‐Eng‐48, Lawrence Berkeley National Laboratory under contract no. DE‐AC02‐05CH11231 and Los Alamos National Laboratory under contract no. DE‐AC02‐06NA25396. JR and PB are supported by the European Union 6th Framework Program (contract no. LSHG‐CT‐2004‐503567).
Supplementary Material [msb200835-sup-0001.pdf]
Supplementary Information [msb200835-sup-0002.xls]
This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.
- Copyright © 2008 EMBO and Nature Publishing Group