The files in this data release are the raw DNA sequence files referenced in the journal 
article by Goldsmith and others (2018) entitled "Comparison of microbiomes of cold-water corals 
Primnoa pacifica and Primnoa resedaeformis, with possible link between microbiome composition 
and host genotype". They represent a 16S rRNA gene amplicon survey of the 
corals' microbiome completed using Roche 454 pyrosequencing with titanium reagents. The raw 
data files of 454 sequences associated with this study have also been submitted to the NCBI 
Sequence Read Archive under Bioproject number PRJNA348705. For more information, you 
may contact Christina Kellogg at the USGS St. Petersburg Coastal and Marine Science Center, 
600 4th Street South, St. Petersburg, Florida, USA, 33710; Telephone: (727) 502-8128; Email: 
ckellogg@usgs.gov.

The folder labeled "Primnoa_raw_data" contains the original SFF sequence files received from 
the sequencing vendor as well as two mapping files.  The samples were sequenced on two 
different pyrosequencing runs, so there are four SFF files (two for each run).  The folder also 
contains .qual and .fna files corresponding to each SFF file. To perform initial splitting of the 
samples from these runs (which also contain other sequences from different studies), it is 
necessary to have a mapping file that corresponds to each run. The file titled 
"Primnoa3008_map.txt" is the mapping file that goes with sequencing files ID2UZ7K01.sff and 
ID2UZ7K02.sff from the first run. The file titled "Primnoaprim1_map.txt" is the mapping file 
that goes with sequencing files H8MF54001.sff and H8MF54002.sff from the second run. The 
column headers in the mapping files are as follows:

SampleID:  a unique identifier for each sample

BarcodeSequence:  multiplex identifier; arbitrary short DNA sequence used to identify the 
sequences from this particular sample out of the pool of samples included on a multiplexed run

LinkerPrimerSequence:  sequence of the forward primer used in polymerase chain
reaction to amplify the DNA extracted from the sample

ReversePrimer: sequence of the reverse primer used in polymerase chain reaction to amplify the 
DNA extracted from the sample

Plate:  identifier for sequencing plate on which DNA extracted from the sample was sequenced

Name:  complete name of sample

Species:  coral species sampled

OceanBasin:  ocean basin in which the sample was collected

Location:  geographic location name for where the sample were collected

Temperature:  water temperature at which coral sample was collected; units are degrees Celsius

Depth: water depth at which coral sample was collected; units are meters

Salinity:  salinity at which coral sample was collected; units are psu (practical salinity units)

Latitude: latitude at which the sample was collected; units are decimal degrees

Longitude: longitude at which the sample was collected; units are decimal degrees

Month:  month in which the sample was collected

Year:  year in which the sample was collected

Description:  may contain notes regarding sample collection or processing

In order to confirm the presence of members of the Chlamydiales order in the bacterial 
communities of Alaskan Primnoa corals (Primnoa pacifica), bacterial DNA extracted from those 
corals was amplified using primers targeting the 23S rRNA gene in Chlamydiales (Everett and others 
(1999), Journal of Clinical Microbiology 37(3):575-580). Amplicons were cloned and screened, 
then sequenced using Sanger sequencing.  Two variant sequences were obtained (607 bp each) 
and are contained in the file titled "Chlamydiales_23S.txt". The 23S sequences have been 
submitted to NCBI (GenBank) under accession numbers KY010287 and KY010288.

The text document titled "Primnoa_workflow.txt" details the scripts run in the bioinformatic 
package QIIME version 1.9.1 (Caporaso and others (2010), Nature Methods 7:335-336, 
doi:10.1038/nmeth.f.303), default or chosen settings used for each script, and the names of the 
input/output files associated with each script.

The file labeled "Primnoa_metadata.txt" is MIMARKS (minimum information about a marker 
gene) compliant metadata, based on standards developed by the Genomic Standards Consortium 
for reporting marker gene sequences (Yilmaz and others, 2011, Nature Biotechnology 29:415-420, 
doi:10.1038/nbt.1823). The column headers are defined as follows:

project_name: a description of the project's contents

sample_name: a unique identifier for each sample

bioproject_id: a unique accession number under which the raw data files have been 
submitted to the NCBI Sequence Read Archive (SRA)

sample_title: a more detailed version of sample_name

organism: what is being sequenced; in this case, a coral metagenome. Metagenome is defined
as a collection of genetic material (genomes) from a mixed community of organisms.

host: organism with which the microbial community being sequenced is associated

collection_date: the date upon which the samples were collected; format is year

geo_loc_name: geographic location name for where the samples were collected

lat_lon: the latitude and longitude where the samples were collected; units are decimal
degrees

samp_collect_device: sample collection device that was used to obtain these samples

samp_mat_process: sample material process; specifically how were the samples preserved
in the field for later study

source_material_id: identification of the source material that the sequenced samples
were derived from (more specific than host; for example, what part of the host)

samp_size: sample size; what amount of sample was used to extract DNA for sequencing

nucl_acid_ext: nucleic acid extraction; DOI is provided for the paper which describes
the specific extraction method employed for these samples

nucl_acid_amp: nucleic acid amplification; DOI is provided for the paper which describes
the specific primers and amplification methods employed for these samples

lib_size: library size; how many sequences were present in each sample at the beginning
of analysis

target_gene: which marker gene was targeted by the chosen primers; 16S ribosomal RNA Gene

target_subfragment: what portion of the marker gene was targeted by the chosen primers;
variable regions V4-V5 of the 16S rRNA Gene

pcr_primers: sequences of the forward and reverse primers used in polymerase chain
reaction to amplify the DNA

mid: multiplex identifiers; arbitrary short DNA sequence used to identify the sequences
from this particular sample out of the pool of samples included on a multiplexed run

seq_meth: sequence method; what sequencing technology was used to produce these data

depth: water depth at which coral samples were collected; units are meters

temp: water temperature at which coral samples were collected; units are degrees Celsius

samp_store_temp: sample storage temperature between field collection and laboratory 
processing; -20 degrees Celsius

salinity: salinity at which coral samples were collected; psu = practical salinity units

contact_name: contact person for data

contact_email: email of contact person for data