The files in this data release are the raw 16S rRNA gene amplicon DNA sequence files from 90 samples of tropical and cold-water corals, as well as raw DNA sequence files from a mock community and two extraction blanks for each kit used for DNA extraction. A mock community was sequenced in order to assess any biases in the sequencing technology, while extraction blanks were sequenced in order to identify any contaminants in the DNA extraction kits. The purpose of this experiment was to compare preservation and DNA extraction methods across tropical and cold-water corals.

Sample naming convention: The first two letters represent the coral genus and species (LP = Lophelia pertusa; MC = Montastraea cavernosa; PA = Porites astreoides; SI = Stephanocoenia intersepta; PJ = Paragorgia johnsoni). The third letter represents the method of preservation employed (R = RNAlater; N = liquid nitrogen; S = DNA/RNA Shield). The subsequent numbers indicate the replicate number. Replicates 1-3 for all corals were extracted using the Promega Maxwell RSC Blood DNA kit and replicates 4-6 for all corals were extracted using the Qiagen DNeasy PowerBiofilm DNA Isolation kit. Note that LP samples represent three different coral colonies (biological replicates) whereas for the remaining four corals, all samples were derived from a single colony (technical replicates). Note that kit blanks are label with KB followed by the replicate number (1 or 2) and their second set of letters represents the extraction kit used (MB = Qiagen and PM = Promega). The mock community is labeled "Methods_Comp_Mock".

DNA was extracted from the coral tissue of each sample using either a Promega Maxwell RSC Blood DNA Kit or a Qiagen DNeasy PowerBiofilm DNA Isolation Kit. To target the V4 variable region of the 16S rRNA gene, a fusion primer set was constructed using primers 515F (5' GTGCCAGCMGCCGCGGTAA) and 806RB (5' GGACTACNVGGGTWTCTAAT) (Apprill and others 2015, doi: 10.3354/ame01753) along with adapters, indices, linkers, and pads in accordance with the dual-index sequencing strategy of Kozich and others (2013, doi: 10.1128/aem.01043-13). After the extracted DNA was amplified using these primers, the amplicons were purified, quantified, and pooled in equal concentrations. Sequencing was then performed on an Illumina MiSeq system with v2 chemistry, generating paired-end 250-bp reads. 

The raw data files associated with this study have also been submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA), under Bioproject number PRJNA544686. For more information, you may contact Christina Kellogg at the USGS St. Petersburg Coastal and Marine Science Center, 600 4th Street South, St. Petersburg, Florida, USA, 33701; Telephone: (727) 502-8128; Email: ckellogg@usgs.gov.

There are two raw data folders. One folder contains the data for samples extracted using the Promega kit, while the other folder contains the data for samples extracted using the Qiagen DNeasy kit because each sample yielded a file of forward reads and a file of reverse reads. The additional six FASTQ files in each folder are the forward and reverse reads for the two extraction blanks for the kit, as well as the forward and reverse reads for the mock community.  
  
The remaining files in each raw data folder are MIMARKS-compliant sampling metadata (Methods_Comp_MIMARKS_compliant_metadata) and SRA metadata files (Methods_Comp_SRA_metadata), which are provided in. xlsx and .csv formats. MIMARKS (minimum information about a marker gene sequence) compliant metadata is based on standards developed by the Genomic Standards Consortium for reporting sequences from metagenomes (Yilmaz and others, 2011, doi:10.1038/nbt.1823; Field and others, 2008, doi:10.1038/nbt1360). The column headers in the MIMARKS metadata files are defined below. A value of "NA" (not applicable) is provided for most fields when the DNA to be sequenced did not originate from the ocean (as in the case of extraction blanks and mock communities). A value of "ND" (not detected) is provided for the depth, temperature, and/or salinity fields when that information was not collected for a sample.

sample_name: a unique identifier for each sample.

sample_title: Title of the sample.

project_name: Name of the project within which the sequencing was organized.

bioproject_accession: accession number of the BioProject to which the sample belongs; this is a unique accession number under which the raw data files have been submitted to the NCBI Sequence Read Archive.

organism:  The most descriptive organism name for this sample (to the species, if possible). 

collection_date: date of sampling; format is YYYY-MM-DD.


env_broad_scale: major environment type(s) where sample was collected.


env_local_scale: Terms that identify environmental entities having causal influences upon the entity at time of sampling, multiple terms can be separated by pipes, e.g.: Êshoreline, intertidal zone.

env_medium: Terms that identify the material displaced by the entity at time of sampling. Recommend subclasses of environmental material [ENVO:00010483]. Multiple terms can be separated by pipes e.g.: estuarine water, estuarine mud.

replicate: The number of each sample replicate.

env_package: Name of package most appropriate for this sample type.

host: The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, "Homo sapiens".

host_taxid: NCBI taxonomy ID of the host, e.g. 9606.

env_material: Host material that was displaced by the sample prior to the sampling event.

geo_loc_name: Geographical origin of the sample. Use a colon to separate the country or ocean from more detailed information about the location, eg "Canada: Vancouver" or "Germany: halfway down Zugspitze, Alps".

lat_lon: The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format "d[d.dddd] N|S d[dd.dddd] W|E", eg, 38.98 N 77.11 W.

site: Given name of the specific area samples were collected from.

depth: water depth at which the samples were collected; units are meters.

temp: Temperature of the sample at time of sampling.

salinity: Salinity measurement at time of sampling.

samp_collect_device: Method or device employed for collecting sample.

samp_mat_process: Processing applied to the sample during or after isolation.

samp_store_temp: Temperature samples were stored at prior to extraction.

contact_name: Name of the primary researcher.

contact_email: Email of the primary researcher.

For additional details and interpretations, please see Pratte, Z.A., and Kellogg, C.A, 2021, Comparison of preservation and extraction methods on five taxonomically disparate coral microbiomes: Frontiers in Marine Science, https://doi.org/10.3389/fmars.2021.684161.