De Novo Genome and Transcriptome Assembly of the Canadian Beaver (Castor canadensis)

The Canadian beaver (Castor canadensis) is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 ×) long reads generated...

Full description

Bibliographic Details
Main Authors: Si Lok, Tara A. Paton, Zhuozhi Wang, Gaganjot Kaur, Susan Walker, Ryan K. C. Yuen, Wilson W. L. Sung, Joseph Whitney, Janet A. Buchanan, Brett Trost, Naina Singh, Beverly Apresto, Nan Chen, Matthew Coole, Travis J. Dawson, Karen Ho, Zhizhou Hu, Sanjeev Pullenayegum, Kozue Samler, Arun Shipstone, Fiona Tsoi, Ting Wang, Sergio L. Pereira, Pirooz Rostami, Carol Ann Ryan, Amy Hin Yan Tong, Karen Ng, Yogi Sundaravadanam, Jared T. Simpson, Burton K. Lim, Mark D. Engstrom, Christopher J. Dutton, Kevin C. R. Kerr, Maria Franke, William Rapley, Richard F. Wintle, Stephen W. Scherer
Format: Article
Language:English
Published: Oxford University Press 2017-02-01
Series:G3: Genes, Genomes, Genetics
Subjects:
Online Access:http://g3journal.org/lookup/doi/10.1534/g3.116.038208
Description
Summary:The Canadian beaver (Castor canadensis) is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 ×) long reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by short reads (80 ×) and checked for accuracy by congruency against an independent short read assembly. We scaffolded the assembly using the exon–gene models derived from 9805 full-length open reading frames (FL-ORFs) constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the precise exon placement for 91.1% of the 9805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology.
ISSN:2160-1836