Whole-genome assembly of the coral reef Pearlscale Pygmy Angelfish (Centropyge vrolikii)

Abstract The diversity of DNA sequencing methods and algorithms for genome assemblies presents scientists with a bewildering array of choices. Here, we construct and compare eight candidate assemblies combining overlapping shotgun read data, mate-pair and Chicago libraries and four different genome...

Full description

Bibliographic Details
Main Authors: Iria Fernandez-Silva, James B. Henderson, Luiz A. Rocha, W. Brian Simison
Format: Article
Language:English
Published: Nature Publishing Group 2018-01-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-018-19430-x
id doaj-f2ebfb266b534c02926db4712f6b97bb
record_format Article
spelling doaj-f2ebfb266b534c02926db4712f6b97bb2020-12-08T04:46:48ZengNature Publishing GroupScientific Reports2045-23222018-01-018111110.1038/s41598-018-19430-xWhole-genome assembly of the coral reef Pearlscale Pygmy Angelfish (Centropyge vrolikii)Iria Fernandez-Silva0James B. Henderson1Luiz A. Rocha2W. Brian Simison3Institute for Biodiversity Science and Sustainability, California Academy of SciencesInstitute for Biodiversity Science and Sustainability, California Academy of SciencesInstitute for Biodiversity Science and Sustainability, California Academy of SciencesInstitute for Biodiversity Science and Sustainability, California Academy of SciencesAbstract The diversity of DNA sequencing methods and algorithms for genome assemblies presents scientists with a bewildering array of choices. Here, we construct and compare eight candidate assemblies combining overlapping shotgun read data, mate-pair and Chicago libraries and four different genome assemblers to produce a high-quality draft genome of the iconic coral reef Pearlscale Pygmy Angelfish, Centropyge vrolikii (family Pomacanthidae). The best candidate assembly combined all four data types and had a scaffold N50 127.5 times higher than the candidate assembly obtained from shotgun data only. Our best candidate assembly had a scaffold N50 of 8.97 Mb, contig N50 of 189,827, and 97.4% complete for BUSCO v2 (Actinopterygii set) and 95.6% complete for CEGMA matches. These contiguity and accuracy scores are higher than those of any other fish assembly released to date that did not apply linkage map information, including those based on more expensive long-read sequencing data. Our analysis of how different data types improve assembly quality will help others choose the most appropriate de novo genome sequencing strategy based on resources and target applications. Furthermore, the draft genome of the Pearlscale Pygmy angelfish will play an important role in future studies of coral reef fish evolution, diversity and conservation.https://doi.org/10.1038/s41598-018-19430-x
collection DOAJ
language English
format Article
sources DOAJ
author Iria Fernandez-Silva
James B. Henderson
Luiz A. Rocha
W. Brian Simison
spellingShingle Iria Fernandez-Silva
James B. Henderson
Luiz A. Rocha
W. Brian Simison
Whole-genome assembly of the coral reef Pearlscale Pygmy Angelfish (Centropyge vrolikii)
Scientific Reports
author_facet Iria Fernandez-Silva
James B. Henderson
Luiz A. Rocha
W. Brian Simison
author_sort Iria Fernandez-Silva
title Whole-genome assembly of the coral reef Pearlscale Pygmy Angelfish (Centropyge vrolikii)
title_short Whole-genome assembly of the coral reef Pearlscale Pygmy Angelfish (Centropyge vrolikii)
title_full Whole-genome assembly of the coral reef Pearlscale Pygmy Angelfish (Centropyge vrolikii)
title_fullStr Whole-genome assembly of the coral reef Pearlscale Pygmy Angelfish (Centropyge vrolikii)
title_full_unstemmed Whole-genome assembly of the coral reef Pearlscale Pygmy Angelfish (Centropyge vrolikii)
title_sort whole-genome assembly of the coral reef pearlscale pygmy angelfish (centropyge vrolikii)
publisher Nature Publishing Group
series Scientific Reports
issn 2045-2322
publishDate 2018-01-01
description Abstract The diversity of DNA sequencing methods and algorithms for genome assemblies presents scientists with a bewildering array of choices. Here, we construct and compare eight candidate assemblies combining overlapping shotgun read data, mate-pair and Chicago libraries and four different genome assemblers to produce a high-quality draft genome of the iconic coral reef Pearlscale Pygmy Angelfish, Centropyge vrolikii (family Pomacanthidae). The best candidate assembly combined all four data types and had a scaffold N50 127.5 times higher than the candidate assembly obtained from shotgun data only. Our best candidate assembly had a scaffold N50 of 8.97 Mb, contig N50 of 189,827, and 97.4% complete for BUSCO v2 (Actinopterygii set) and 95.6% complete for CEGMA matches. These contiguity and accuracy scores are higher than those of any other fish assembly released to date that did not apply linkage map information, including those based on more expensive long-read sequencing data. Our analysis of how different data types improve assembly quality will help others choose the most appropriate de novo genome sequencing strategy based on resources and target applications. Furthermore, the draft genome of the Pearlscale Pygmy angelfish will play an important role in future studies of coral reef fish evolution, diversity and conservation.
url https://doi.org/10.1038/s41598-018-19430-x
work_keys_str_mv AT iriafernandezsilva wholegenomeassemblyofthecoralreefpearlscalepygmyangelfishcentropygevrolikii
AT jamesbhenderson wholegenomeassemblyofthecoralreefpearlscalepygmyangelfishcentropygevrolikii
AT luizarocha wholegenomeassemblyofthecoralreefpearlscalepygmyangelfishcentropygevrolikii
AT wbriansimison wholegenomeassemblyofthecoralreefpearlscalepygmyangelfishcentropygevrolikii
_version_ 1724392083510263808