ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers

Abstract Background The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding...

Full description

Bibliographic Details
Main Authors: Lauren Coombe, Jessica Zhang, Benjamin P. Vandervalk, Justin Chu, Shaun D. Jackman, Inanc Birol, René L. Warren
Format: Article
Language:English
Published: BMC 2018-06-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2243-x
id doaj-197885370f9b4e5095cac23ac5496b4f
record_format Article
spelling doaj-197885370f9b4e5095cac23ac5496b4f2020-11-24T23:57:13ZengBMCBMC Bioinformatics1471-21052018-06-0119111010.1186/s12859-018-2243-xARKS: chromosome-scale scaffolding of human genome drafts with linked read kmersLauren Coombe0Jessica Zhang1Benjamin P. Vandervalk2Justin Chu3Shaun D. Jackman4Inanc Birol5René L. Warren6BC Cancer Genome Sciences CentreBC Cancer Genome Sciences CentreBC Cancer Genome Sciences CentreBC Cancer Genome Sciences CentreBC Cancer Genome Sciences CentreBC Cancer Genome Sciences CentreBC Cancer Genome Sciences CentreAbstract Background The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time. Results Here, we show how linked reads, when used in conjunction with Hi-C data for scaffolding, improve a draft human genome assembly of PacBio long-read data five-fold (baseline vs. ARKS NG50 = 4.6 vs. 23.1 Mbp, respectively). We also demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly (NG50 = 14.74 Mbp vs. 25.94 Mbp before and after ARKS), which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders (~ 10.5 h compared to 75.7 h, on average). Following ARKS scaffolding of a human genome 10xG Supernova assembly (of cell line NA12878), fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n = 13). Conclusions ARKS uses a kmer mapping strategy instead of linked read alignments to record and associate the barcode information needed to order and orient draft assembly sequences. The simplified workflow, when compared to that of our initial implementation, ARCS, markedly improves run time performances on experimental human genome datasets. Furthermore, the novel distance estimator in ARKS utilizes barcoding information from linked reads to estimate gap sizes. It accomplishes this by modeling the relationship between known distances of a region within contigs and calculating associated Jaccard indices. ARKS has the potential to provide correct, chromosome-scale genome assemblies, promptly. We expect ARKS to have broad utility in helping refine draft genomes.http://link.springer.com/article/10.1186/s12859-018-2243-x10× Genomics ChromiumARKSARCSNext-generation sequencingde novo assemblyGenome scaffolding
collection DOAJ
language English
format Article
sources DOAJ
author Lauren Coombe
Jessica Zhang
Benjamin P. Vandervalk
Justin Chu
Shaun D. Jackman
Inanc Birol
René L. Warren
spellingShingle Lauren Coombe
Jessica Zhang
Benjamin P. Vandervalk
Justin Chu
Shaun D. Jackman
Inanc Birol
René L. Warren
ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers
BMC Bioinformatics
10× Genomics Chromium
ARKS
ARCS
Next-generation sequencing
de novo assembly
Genome scaffolding
author_facet Lauren Coombe
Jessica Zhang
Benjamin P. Vandervalk
Justin Chu
Shaun D. Jackman
Inanc Birol
René L. Warren
author_sort Lauren Coombe
title ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers
title_short ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers
title_full ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers
title_fullStr ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers
title_full_unstemmed ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers
title_sort arks: chromosome-scale scaffolding of human genome drafts with linked read kmers
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2018-06-01
description Abstract Background The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time. Results Here, we show how linked reads, when used in conjunction with Hi-C data for scaffolding, improve a draft human genome assembly of PacBio long-read data five-fold (baseline vs. ARKS NG50 = 4.6 vs. 23.1 Mbp, respectively). We also demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly (NG50 = 14.74 Mbp vs. 25.94 Mbp before and after ARKS), which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders (~ 10.5 h compared to 75.7 h, on average). Following ARKS scaffolding of a human genome 10xG Supernova assembly (of cell line NA12878), fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n = 13). Conclusions ARKS uses a kmer mapping strategy instead of linked read alignments to record and associate the barcode information needed to order and orient draft assembly sequences. The simplified workflow, when compared to that of our initial implementation, ARCS, markedly improves run time performances on experimental human genome datasets. Furthermore, the novel distance estimator in ARKS utilizes barcoding information from linked reads to estimate gap sizes. It accomplishes this by modeling the relationship between known distances of a region within contigs and calculating associated Jaccard indices. ARKS has the potential to provide correct, chromosome-scale genome assemblies, promptly. We expect ARKS to have broad utility in helping refine draft genomes.
topic 10× Genomics Chromium
ARKS
ARCS
Next-generation sequencing
de novo assembly
Genome scaffolding
url http://link.springer.com/article/10.1186/s12859-018-2243-x
work_keys_str_mv AT laurencoombe arkschromosomescalescaffoldingofhumangenomedraftswithlinkedreadkmers
AT jessicazhang arkschromosomescalescaffoldingofhumangenomedraftswithlinkedreadkmers
AT benjaminpvandervalk arkschromosomescalescaffoldingofhumangenomedraftswithlinkedreadkmers
AT justinchu arkschromosomescalescaffoldingofhumangenomedraftswithlinkedreadkmers
AT shaundjackman arkschromosomescalescaffoldingofhumangenomedraftswithlinkedreadkmers
AT inancbirol arkschromosomescalescaffoldingofhumangenomedraftswithlinkedreadkmers
AT renelwarren arkschromosomescalescaffoldingofhumangenomedraftswithlinkedreadkmers
_version_ 1725454898121670656