The files in this data release are the raw DNA sequence files referenced in the journal article by Kellogg and Pratte (2021), entitled "Unexpected diversity of Endozoicomonas in deep-sea corals". [Kellogg, C.A. and Pratte, Z.A., 2021, Unexpected diversity of Endozoicomonas in deep-sea corals: Marine Ecology Progress Series, v.673, article 13844, 15 p.,  https://doi.org/10.3354/meps13844.] They represent 16S rRNA gene amplicon surveys of 28 samples of deep-sea corals, including Acanthogorgia aspera (n=5), Acanthogorgia spissa (n=4), Desmophyllum dianthus (n=7), and Lophelia pertusa [Desmophyllum pertusum] (n=12), plus a kit extraction control blank. The sequencing targeted the V3-V4 variable region (primers 341F/806R) and was completed using Illumina MiSeq with version 2 chemistry. These raw data files have also been submitted to the NCBI Sequence Read Archive under BioProject number PRJNA699458. For more information, you may contact Christina Kellogg at the USGS St. Petersburg Coastal and Marine Science Center, 600 4th Street South, St. Petersburg, Florida, USA, 33710; Telephone: (727) 502-8128; Email: ckellogg@usgs.gov.

The file labeled "PJNA699458_16S-V3V4_raw_data_1" contains 56 raw 16S rRNA gene sequence files from the deep-sea coral samples, 2 KITBLANK extraction control sequence files, and an md5.txt file (designed to test the downloaded files for errors). The data consist of two compressed FASTQ sequence files (paired end forward and reverse reads) per sample.
The file labeled "PRJNA699458_MIMARKS.zip" contains "PRJNA699458_MIMARKS.xlsx", "PRJNA699458 _MIMARKS.txt", and "PRJNA699458_MIMARKS.csv" that are MIMARKS (minimum information about a marker sequence) compliant metadata, based on standards developed by the Genomic Standards Consortium for reporting marker gene sequences (Yilmaz and others, 2011, Nature Biotechnology 29:415-420, doi:10.1038/nbt.1823). An entry of "NA" is defined as "not applicable" and an entry of "ND" is defined as "not determined". The column headers in these metadata files are defined as follows:

sample_name: a unique identifier for each sample

sample_title: title of the sample

bioproject_accession: accession number of the BioProject to which the sample belongs; this is a unique accession number under which the raw data files have been submitted to the NCBI Sequence Read Archive

organism: name of organism sample or type of sample

collection_date: date of sampling; format is YYYY-MM-DD

env_broad_scale: major environment type(s) where sample was collected

env_local_scale: environmental entities having causal influences upon the entity at time of sampling

env_medium: material displaced by the entity at time of sampling

geo_loc_name: geographical origin of the sample

host: natural (as opposed to laboratory) host to the organism from which the sample was obtained

lat_lon: geographical coordinates of the location where the sample was collected in decimal degrees

site: given name of the specific area samples were collected from

depth: water depth at which the samples were collected; units are meters

host_subject_id: a unique identifier by which each subject can be referred to, de-identified

host_taxid: NCBI taxonomy ID of the host, e.g. 9606

samp_collect_device: method or device employed for collecting sample

samp_mat_process: Processing applied to the sample during or after isolation

samp_salinity: salinity measurement at time of sampling; units are practical salinity units

temp: temperature of the sample at time of sampling

samp_store_temp: temperature samples were stored at prior to extraction

contact_name: name of the primary researcher

contact_email: email of the primary researcher