Corals residing in habitats that experience frequent seawater pCO2 variability may possess an enhanced capacity to cope with ocean acidification. Yet, we lack a clear understanding of the molecular toolkit enabling acclimatization to environmental extremes, and how life-long exposure to pCO2 variability influences biomineralization. We examined the gene expression responses and micro-skeletal characteristics of Pocillopora damicornis originating from the reef flat and reef slope of Heron Isl...
Show moreSample Collection
The experiment was performed during the austral summer from mid-January to late March 2021 at Heron Island Research Station (HIRS), southern Great Barrier Reef (23 27°S, 151 55°E). Heron Reef is composed of five distinct geomorphological habitats characterized by diverse benthic communities and biogeochemical conditions. Fragments of the coral P. damicornis were collected from the reef flat and slope locations within the same depth range (1–3 m) on 14 and 15 January 2021. Four fragments were collected from each individual colony (genetic clones), totalling 96 fragments from 24 colonies (n = 12 per habitat). For more information about sample collection methods and treatment, see Brown et al., 2022.
RNA and DNA Extractions
Samples for nucleic acid extraction were preserved in RNAlaterTM stabilization solution and stored at -80ºC until extraction. Genomic DNA and total RNA were extracted concurrently using the Zymo Quick-DNA/RNA Miniprep Plus Kit (Zymo Research #D7003) using the following modifications to the manufacturer’s protocol. Samples were thawed on ice and two small fragments (5 mm x 5 mm) were removed from the stabilization solution using sterile forceps. Immediately, excess stabilization solution was removed with the corner of a KimWipe and fragments were submerged in a new 1.5 mL screw-cap tube containing 0.5mm glass beads (Fisher Scientific, #50-212-143) and 800 μl of DNA/RNA shield (Zymo Research, #R1100-50). Samples were homogenized by vortex at max speed for 1–2 minutes. Following homogenization, 400 μl of homogenate was removed and centrifuged for 3 mins at 9000 rcf. The supernatant was transferred to a new tube and mixed with 30 μl Proteinase K digestion buffer and 15 μl Proteinase K (Zymo Research #D7003). This mixture was incubated at room temperature for 15 minutes followed by another centrifugation step for 3 mins at 9000 rcf. The supernatant was combined with an equal volume of DNA/RNA lysis buffer (Zymo Research #D7003). Extraction was thereafter performed as described by the manufacturer’s protocol, including the optional DNase I treatment. DNA and RNA concentration were quantified using the Qubit dsDNA and RNA Broad Range kits (Invitrogen #Q10211) and integrity was assayed using non-denaturing gel electrophoresis (1.5% agarose gel, 60 minutes at 60 V). Purity was assessed using Nanodrop.
Species Identification
Twenty of the 24 colonies were previously identified as P. damicornis (Brown et al., 2022). For the four colonies that were not identified due to difficulties with PCR amplification, the mitochondrial open reading frame (mtORF) region was amplified from new genomic DNA extracts via PCR as described by (Burgess et al., 2021; Johnston et al., 2018) using primers from (Flot et al., 2008): FatP6.1 (5′-TTTGGGSATTCGTTTAGCAG-3′) and RORF (5′-SCCAATATGTTAAACASCATGTCA-3′). PCR master mixes contained 12.55 µL of EmeraldAmp GT PCR Master Mix (TaKaRa Bio USA Inc. Cat # RR310B), 0.32 µL of forward and reverse primers as listed, 1 µL of template DNA, and 10.80 µL of nuclease free water totaling 25 µL final volume. Negative controls were included as master mix without template DNA. mtORF was amplified using a polymerase chain reaction (PCR) profile of a single denaturation set of 94°C for 60 s followed by 30 cycles of 94°C for 30 s for denaturation, 53°C for 30 s for annealing, and 72°C for 75 s for extension and a final incubation of 72°C for 5 min. PCR products were assessed with a 1.5% agarose gel in TAE for 30 mins at 80 V with an expected band size of approx. 1000 bp. PCR products were cleaned using ethanol precipitation. 1/10th of the volume of 3M sodium acetate (Fisher Cat. AAJ61928AE) was added to each PCR product, followed by an addition of 3 times the total volume of the mixture of ice-cold 100% ethanol. The mixture was incubated overnight at -20 ºC and DNA was precipitated by centrifugation at 15,000 rcf for 30 mins at room temperature. The pellet was washed with 70% ethanol twice, dried, and resuspended in 30 µl of 1M Tris-HCI, pH 8.0 (Fisher Cat. 15568025). DNA quantity was assessed using Broad Range dsDNA Qubit and Nanodrop. Sanger sequencing using the same primers utilized during PCR amplification was performed at the URI Genomics and Sequencing center using Applied Biosystems BigDye Terminator v3.1. Geneious Prime (Version 2023.2.1) was used to align the four sequences (Muscle 5.1 alignment, PPP algorithm, default parameters) to known P. damicornis (NCBI Accession Numbers: JX994077, KJ720219, JX994086, JX624991, KF583925, KF583946, EU374235, KJ720218, KP698587, KM215098, KX538982, KJ720226, KX538983, KF583950, JX994087, KJ720235, JX985618, KJ690905, JX625025, and KX538984) and P. acuta (NCBI Accession Numbers: JX994073, JX624999, KF583928, KF583935, EU374226, FJ424111, KJ720240, KM215075, KJ690906, KP698585, KX538985, KX538986, KM215104, and KJ720241) mtORF sequences and a Neighbor-Joining tree was built from the alignment. The samples sequenced in this study were identified as P. damicornis.
Library Preparation and Tag Seq Analysis
Extracted total RNA was prepared for TagSeq (Lohman et al., 2016). Library preparation and sequencing of the 48 samples was conducted at the University of Texas Austin, Genomic Sequencing and Analysis Facility. Sequencing was completed targeting standard coverage of 3–5 million 100-bp single-end reads per sample (Illumina NovaSeq 600 SR100). We used fastp to trim raw TagSeq reads of TruSeq 1 Illumina adapters, poly-A and poly-G sequences, and remove low-quality reads (i.e., reads with >40% of bases with Phred score <30) and low-complexity reads (i.e., <50%) (Chen et al., 2018). Before and after filtering, MultiQC was used to assess filtering success (Ewels et al., 2016). After successful trimming and filtering, filtered reads were mapped to the Pocillopora damicornis genome (Cunning et al., 2018) and P. acuta genome (Stephens et al., 2022) using HISAT2 using the downstream-transcriptome-assembly mode for unpaired reads (Kim et al., 2019). Mapping rate was approximately 150% higher for the P. acuta genome than the P. damicornis genome (average of 72.48% compared to 29.10%), despite all corals used in the experiments confirmed as P. damicornis (Brown et al., 2022). Accordingly, downstream analyses were performed using the P. acuta genome. Mapped reads were quantified and assembled using StringTie (Pertea et al., 2016) and a gene count matrix was generated using the Stringtie script prepDE.py for downstream analysis. All code and data are publicly available (https://github.com/imkristenbrown/Heron-Pdam-gene-expression/, Dellaert et al. (2024), doi: 10.5281/zenodo.14041606). All raw TagSeq data can be accessed at the Sequence Read Archive (SRA) (https://www.ncbi.nlm.nih.gov/sra/PRJNA934298).
Quality control of gene expression data
All analyses were conducted using R version 4.0.3 software (R Core Team, 2020). First, all genes not detected (i.e., 100% of the samples showed counts of 0) across any of our samples (n=48) were removed, leaving data for 24,220 genes of the 33,730 genes in the P. acuta genome. Next, low-expression genes, defined as any gene that did not have at least 10 counts in 25% of the samples, were removed using the function pOverA in the package genefilter (Gentleman R, Carey V, Huber W, Hahne F, 2023). This retained genes that were expressed in at least one of the four different conditions, leaving a robust dataset of counts for 9,056 genes. Gene counts were normalized and transformed using variance stabilizing transformation (herein referred to as ‘vst-normalized gene expression’) in the package DESeq2 (Love et al., 2014). After examination of vst-normalized gene expression plots, two outliers were identified (two samples of the same genotype, RF16). These outliers were also identified as having high percentage of sequence duplication based on FastQC of raw fastq sequence files and were subsequently removed from the analysis. Following the removal of these two samples, pOverA filtering and variance stabilizing transformation was re-run, and the final dataset for statistical analysis included a robust dataset of counts for 9,012 genes from 46 samples. The resulting sample group sizes (biological replicates) for gene expression analysis were: flat-stable (n=11), flat-variable (n=11), slope-stable (n=12) and slope-variable (n=12).
Co-expression Analysis
Weighted Gene Co-expression Network Analysis (WGCNA, (Langfelder & Horvath, 2008)) was carried out using the R package WGCNA (version 1.72-5) to identify co-expression patterns based on native habitat of origin (flat vs. slope) or 8 weeks of exposure to pCO2 treatments (stable vs. variable). A signed adjacency similarity matrix of vst-normalized gene expression was constructed for all gene pairs across samples using the adjacency function (using a soft power of 5), then converted into a topological overlap dissimilarity matrix using the function TOMsimilarity (Langfelder & Horvath, 2008). Next, a hierarchical clustering of genes based on topological overlap was performed using the function hclust, followed by module identification using the function cutreeDynamic (hybrid method, deepSplit = 1) in the package dynamicTreeCut (Langfelder et al., 2008), retaining modules with at least 30 genes and merging highly similar modules using the function mergeCloseModules with a cut height of 0.15 (eigengenes correlated at R > 0.85). Trait data (categorical and physiological metrics) were then related to the expression of modules and clustered based on eigengene correlation using the package complexHeatmap (Gu et al., 2016). The complete list of the caterogrical and physiological metrics included was: net calcification, CaCO3 density, net photosynthesis, light-enhanced dark respiration, dark respiration, photosynthesis:respiration ratio, protein concentration, symbiont density, chlorophyll a concentration, mycosporine-like amino acid shinorine, rate of pHi recovery (symbiocyte and non-symbiocyte), acidosis magnitude (symbiocyte and non-symbiocyte), as well as categorical factors treatment, origin and genotype.
Gene ontology (GO) enrichment was explored for WGCNA modules using the EggNog (Cantalapiedra et al., 2021) and KEGG functional annotation files from the P. acuta genome (Stephens et al., 2022). Of the 9,012 genes in the dataset, there was no GO annotation for 4,448 of the genes and no KEGG annotation for 3,828 of the genes. Gene length information was calculated from the “gff3” file of the P. acuta genome (Stephens et al., 2022). An “over-represented p-value” for each GO term was calculated using the Wallenius method in the package goseq (Young et al., 2012) and only the biological process (BP) ontology was investigated. GO terms were considered to be overrepresented if they had a p-value of less than 0.01. An adjusted p-value using the Benjamini & Hochberg (“BH”) method of the p.adjust function in R was calculated for each GO term, and is available in the supporting information of the results paper. The overrepresented GO term set was further reduced using a similarity matrix with a threshold of 0.7 to collapse similar GO terms for plotting them by their parent terms using the package rrvgo (Sayols, 2023).
Differentially expressed genes and gene expression frontloading
A linear mixed effects model was used to analyze differential gene expression by origin and treatment, with colony as a random effect, using the package glmmSeq (Lewis et al., 2021). First, a dispersion vector and size factors were calculated for each gene using DESeq2 on the raw gene count matrix (i.e., vst-normalized gene counts were not used as input for glmmSeq analysis). Then, the interactive effects of origin and treatment were explored on gene expression data (raw count data with dispersions provided), with colony genotype as a random effect. For one gene, the model did not converge, causing the final dataset for glmmSeq to contain 9,011 genes. Adjusted p-values were calculated using the package qvalue (Storey JD, Bass AJ, Dabney A, Robinson D, 2023). Genes with an adjusted p-value (q-value) less than 0.05 were considered to be differentially expressed.
Because habitat of origin was a major driver of the gene expression responses, we further tested for frontloading of transcripts (constitutive levels of expression) following the methodology of (Gurr et al., 2022). We calculated two ratios: a control ratio (stable flat / stable slope) and a fold change ratio (variable/stable flat / variable/stable slope) (Gurr et al., 2022). “Frontloaded” transcripts are defined as having: (i) a greater expression in the flat habitat of origin in the variable treatment compared to the slope origin corals in the stable treatment (control ratio > 1) and (ii) a smaller fold change from variable to stable in the flat origin compared to the slope origin (fold change ratio < 1) (Barshis et al., 2013). A fold change ratio of less than 1 indicates that these transcripts underwent less of an expression change from the stable to variable treatment in the corals from the flat habitat, suggesting “frontloading” of expression to cope with a more variable environment (Barshis et al., 2013), since these genes were constitutively higher expressed in the flat origin corals under a stable pH treatment (control ratio > 1). These calculations were made based on the glmmseq estimated mean expression of each gene in each treatment:origin combination based on the fitted model.
Gene ontology enrichment was performed for genes differentially expressed by origin, treatment, and the interaction as well as for frontloaded genes using the same method as described for the WGCNA modules. Enrichment of these gene sets was tested against the 9,011 genes in the glmmSeq dataset. Full datafiles are available in the supporting information of the results paper.
Expression of putative biomineralization-related genes
The expression of coral biomineralization-related genes from the literature (Scucchia, Malik, Putnam, et al., 2021; Scucchia, Malik, Zaslansky, et al., 2021) was compared to both the differential expression and frontloaded results. A BLASTp search (using BLAST+ 2.9.0-iimpi-2019b) was performed against a database of the P. acuta genome to identify the best match (e-value < 0.01) of each of these coral biomineralization-related genes in the genome of P. acuta. The top hit for each biomineralization-related gene was used (Stephens et al., 2022).
Barott, K., Putnam, H., Brown, K. (2024) Gene expression of Pocillopora damicornis collected from reef of Heron Island, southern Great Barrier Reef from Jan 2021 to Feb 2021. Biological and Chemical Oceanography Data Management Office (BCO-DMO). (Version 1) Version Date 2024-11-05 [if applicable, indicate subset used]. http://lod.bco-dmo.org/id/dataset/942938 [access date]
Terms of Use
This dataset is licensed under Creative Commons Attribution 4.0.
If you wish to use this dataset, it is highly recommended that you contact the original principal investigators (PI). Should the relevant PI be unavailable, please contact BCO-DMO (info@bco-dmo.org) for additional guidance. For general guidance please see the BCO-DMO Terms of Use document.