Improving eukaryotic genome annotation using single molecule mRNA sequencing

Abstract Background The advantages of Pacific Biosciences (PacBio) single-molecule real-time (SMRT) technology include long reads, low systematic bias, and high consensus read accuracy. Here we use these attributes to improve on the genome annotation of the parasitic hookworm Ancylostoma ceylanicum...

Full description

Bibliographic Details
Main Authors: Vincent Magrini, Xin Gao, Bruce A. Rosa, Sean McGrath, Xu Zhang, Kymberlie Hallsworth-Pepin, John Martin, John Hawdon, Richard K. Wilson, Makedonka Mitreva
Format: Article
Language:English
Published: BMC 2018-03-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-018-4555-7
id doaj-f4dff008f2f0405383530c7aad84b267
record_format Article
spelling doaj-f4dff008f2f0405383530c7aad84b2672020-11-24T21:16:08ZengBMCBMC Genomics1471-21642018-03-0119111410.1186/s12864-018-4555-7Improving eukaryotic genome annotation using single molecule mRNA sequencingVincent Magrini0Xin Gao1Bruce A. Rosa2Sean McGrath3Xu Zhang4Kymberlie Hallsworth-Pepin5John Martin6John Hawdon7Richard K. Wilson8Makedonka Mitreva9McDonnell Genome Institute, Washington University School of MedicineMcDonnell Genome Institute, Washington University School of MedicineMcDonnell Genome Institute, Washington University School of MedicineMcDonnell Genome Institute, Washington University School of MedicineMcDonnell Genome Institute, Washington University School of MedicineMcDonnell Genome Institute, Washington University School of MedicineMcDonnell Genome Institute, Washington University School of MedicineDepartment of Microbiology, Immunology and Tropical Medicine, The George Washington UniversityMcDonnell Genome Institute, Washington University School of MedicineMcDonnell Genome Institute, Washington University School of MedicineAbstract Background The advantages of Pacific Biosciences (PacBio) single-molecule real-time (SMRT) technology include long reads, low systematic bias, and high consensus read accuracy. Here we use these attributes to improve on the genome annotation of the parasitic hookworm Ancylostoma ceylanicum using PacBio RNA-Seq. Results We sequenced 192,888 circular consensus sequences (CCS) derived from cDNAs generated using the CloneTech SMARTer system. These SMARTer-SMRT libraries were normalized and size-selected providing a robust population of expressed structural genes for subsequent genome annotation. We demonstrate PacBio mRNA sequences based genome annotation improvement, compared to genome annotation using conventional sequencing-by-synthesis alone, by identifying 1609 (9.2%) new genes, extended the length of 3965 (26.7%) genes and increased the total genomic exon length by 1.9 Mb (12.4%). Non-coding sequence representation (primarily from UTRs based on dT reverse transcription priming) was particularly improved, increasing in total length by fifteen-fold, by increasing both the length and number of UTR exons. In addition, the UTR data provided by these CCS allowed for the identification of a novel SL2 splice leader sequence for A. ceylanicum and an increase in the number and proportion of functionally annotated genes. RNA-seq data also confirmed some of the newly annotated genes and gene features. Conclusion Overall, PacBio data has supported a significant improvement in gene annotation in this genome, and is an appealing alternative or complementary technique for genome annotation to the other transcript sequencing technologies.http://link.springer.com/article/10.1186/s12864-018-4555-7Genome annotation improvementPacific bioscience mRNA sequencingAncylostoma ceylanicumHookwormGene loci
collection DOAJ
language English
format Article
sources DOAJ
author Vincent Magrini
Xin Gao
Bruce A. Rosa
Sean McGrath
Xu Zhang
Kymberlie Hallsworth-Pepin
John Martin
John Hawdon
Richard K. Wilson
Makedonka Mitreva
spellingShingle Vincent Magrini
Xin Gao
Bruce A. Rosa
Sean McGrath
Xu Zhang
Kymberlie Hallsworth-Pepin
John Martin
John Hawdon
Richard K. Wilson
Makedonka Mitreva
Improving eukaryotic genome annotation using single molecule mRNA sequencing
BMC Genomics
Genome annotation improvement
Pacific bioscience mRNA sequencing
Ancylostoma ceylanicum
Hookworm
Gene loci
author_facet Vincent Magrini
Xin Gao
Bruce A. Rosa
Sean McGrath
Xu Zhang
Kymberlie Hallsworth-Pepin
John Martin
John Hawdon
Richard K. Wilson
Makedonka Mitreva
author_sort Vincent Magrini
title Improving eukaryotic genome annotation using single molecule mRNA sequencing
title_short Improving eukaryotic genome annotation using single molecule mRNA sequencing
title_full Improving eukaryotic genome annotation using single molecule mRNA sequencing
title_fullStr Improving eukaryotic genome annotation using single molecule mRNA sequencing
title_full_unstemmed Improving eukaryotic genome annotation using single molecule mRNA sequencing
title_sort improving eukaryotic genome annotation using single molecule mrna sequencing
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2018-03-01
description Abstract Background The advantages of Pacific Biosciences (PacBio) single-molecule real-time (SMRT) technology include long reads, low systematic bias, and high consensus read accuracy. Here we use these attributes to improve on the genome annotation of the parasitic hookworm Ancylostoma ceylanicum using PacBio RNA-Seq. Results We sequenced 192,888 circular consensus sequences (CCS) derived from cDNAs generated using the CloneTech SMARTer system. These SMARTer-SMRT libraries were normalized and size-selected providing a robust population of expressed structural genes for subsequent genome annotation. We demonstrate PacBio mRNA sequences based genome annotation improvement, compared to genome annotation using conventional sequencing-by-synthesis alone, by identifying 1609 (9.2%) new genes, extended the length of 3965 (26.7%) genes and increased the total genomic exon length by 1.9 Mb (12.4%). Non-coding sequence representation (primarily from UTRs based on dT reverse transcription priming) was particularly improved, increasing in total length by fifteen-fold, by increasing both the length and number of UTR exons. In addition, the UTR data provided by these CCS allowed for the identification of a novel SL2 splice leader sequence for A. ceylanicum and an increase in the number and proportion of functionally annotated genes. RNA-seq data also confirmed some of the newly annotated genes and gene features. Conclusion Overall, PacBio data has supported a significant improvement in gene annotation in this genome, and is an appealing alternative or complementary technique for genome annotation to the other transcript sequencing technologies.
topic Genome annotation improvement
Pacific bioscience mRNA sequencing
Ancylostoma ceylanicum
Hookworm
Gene loci
url http://link.springer.com/article/10.1186/s12864-018-4555-7
work_keys_str_mv AT vincentmagrini improvingeukaryoticgenomeannotationusingsinglemoleculemrnasequencing
AT xingao improvingeukaryoticgenomeannotationusingsinglemoleculemrnasequencing
AT brucearosa improvingeukaryoticgenomeannotationusingsinglemoleculemrnasequencing
AT seanmcgrath improvingeukaryoticgenomeannotationusingsinglemoleculemrnasequencing
AT xuzhang improvingeukaryoticgenomeannotationusingsinglemoleculemrnasequencing
AT kymberliehallsworthpepin improvingeukaryoticgenomeannotationusingsinglemoleculemrnasequencing
AT johnmartin improvingeukaryoticgenomeannotationusingsinglemoleculemrnasequencing
AT johnhawdon improvingeukaryoticgenomeannotationusingsinglemoleculemrnasequencing
AT richardkwilson improvingeukaryoticgenomeannotationusingsinglemoleculemrnasequencing
AT makedonkamitreva improvingeukaryoticgenomeannotationusingsinglemoleculemrnasequencing
_version_ 1726016870123831296