Raw data and assembled scaffolds for the Atlantic silverside genome
Sampling and analytical procedures
The genomes of two different individuals were sequenced with different approaches:
1. An individual sampled at Jekyll Island, Georgia:
We built a reference genome for the Atlantic silverside through three steps. First, we created a draft assembly using 10x Genomics linked-reads technology (10x Genomics, Pleasanton, CA). Second, we used proximity-ligation data— ChicagoVR (Putnam et al. 2016) and Dovetail
Hi-C (Lieberman-Aiden et al. 2009)—from Dovetail Genomics (Santa Cruz, CA) to increase contiguity, break up mis-joins, and orient and join scaffolds into chromosomes. Finally, we used short-insert reads to close gaps in the scaffolded and error-corrected assembly. The data were generated from muscle tissue dissected from two lab-reared F1 offspring of Atlantic silversides collected from the wild on Jekyll Island, GA, USA (N31.02 ,W81.43 ; the southern end of the species distribution range) in May 2017. For 10x Genomics library preparation, we extracted DNA from fresh tissue from one individual using the MagAttract HMW DNA Kit (Qiagen). Prior to library preparation, we selected fragments longer than 30 kb using a BluePippin device (Sage Science). A 10x Genomics library was prepared following standard procedure and sequenced using two lanes of paired-end 150 bp reads on a HiSeq2500 (rapid run mode) at the Biotechnology Resource Center Genomics Facility at Cornell University. To assemble the linked reads, we ran the program Supernova 2.1.1 (Weisenfeld et al. 2017) from 10x Genomics with varying numbers of reads and compared assembly statistics to identify the number of reads that resulted in the most contiguous assembly. Tissue from the second individual was flash frozen in liquid nitrogen and shipped to Dovetail Genomics, where Chicago and Hi-C libraries were prepared for further scaffolding. These long-range libraries were sequenced on one lane of Illumina HiSeqX using paired-end 150 bp reads. Two rounds of scaffolding with HiRiseTM, a software pipeline developed specifically for genome scaffolding with Chicago and Hi-C data, were run to scaffold and error-correct the best 10x Genomics draft assembly using Dovetail long-range data. Finally, the barcode-trimmed 10x Genomics reads were used to close gaps between contigs as the final step of the HiRise pipeline.
2. An individual sampled in Mumford Cove, Connecticut
This assembly was a lower-quality draft assembly used to identify structural variants in comparison to the chromosome-level assembly from the individual sampled in Georgia
The individual sampled for this assembly was sampled from Mumford Cove, Connecticut (N 41.32 , W 72.02 ) in June 2016. Genomic DNA was extracted from muscle tissue using the DNeasy Blood and Tissue kit (Qiagen) and normalized to 40 ng/ul. We prepared a genomic DNA library using the TruSeq DNA PCR-free library kit (Illumina) following the manufacturer’s protocol for 550 bp insert libraries. The shotgun library was sequenced using paired-end 150 bp reads on an Illumina HiSeq4000. Mate-pair libraries with insert sizes of 3, 5.3, and 8.2 kb were prepared at the Huntsman Cancer Institute at the University of Utah using the Nextera Mate Pair Library Prep Kit (Illumina) and sequenced using paired-end 125 bp reads on an Illumina HiSeq2500. We used Trimmomatic 0.36 (Bolger et al. 2014) to remove adapter contamination and low-quality data from both the shotgun and the mate-pair libraries and used these filtered reads to assemble a draft assembly and fill assembly gaps with Platanus v.1.2.4 (Kajitani et al. 2014) with the commands assemble, scaffold, and gap_close. Finally,we filtered scaffolds shorter than 1 kb.
Therkildsen, N. O. (2024) Atlantic silverside genome. Biological and Chemical Oceanography Data Management Office (BCO-DMO). (Version 1) Version Date 2024-01-02 [if applicable, indicate subset used]. http://lod.bco-dmo.org/id/dataset/917790 [access date]
Terms of Use
This dataset is licensed under Creative Commons Attribution 4.0.
If you wish to use this dataset, it is highly recommended that you contact the original principal investigators (PI). Should the relevant PI be unavailable, please contact BCO-DMO (info@bco-dmo.org) for additional guidance. For general guidance please see the BCO-DMO Terms of Use document.