On the use of algebraic topology concepts to check the consistency of genome assembly

This paper presents a preliminary work consisting of two contributions. The first one is the design of a very efficient algorithm based on an “Overlap-Layout-Consensus” (OLC) graph to assemble the long reads provided by 3rd generation technologies. The second concerns the analysis of this graph usin...

Full description

Bibliographic Details
Main Author: Jean-François Gibrat
Format: Article
Language:English
Published: The Biophysical Society of Japan 2019-11-01
Series:Biophysics and Physicobiology
Subjects:
Online Access:https://doi.org/10.2142/biophysico.16.0_444
id doaj-f4ddaa2d898e40be9523a06b658781f0
record_format Article
spelling doaj-f4ddaa2d898e40be9523a06b658781f02020-11-25T02:58:41ZengThe Biophysical Society of JapanBiophysics and Physicobiology2189-47792019-11-011610.2142/biophysico.16.0_444On the use of algebraic topology concepts to check the consistency of genome assemblyJean-François Gibrat0MaIAGE, INRA, Université Paris-Saclay, Jouy-en-Josas 78350, FranceThis paper presents a preliminary work consisting of two contributions. The first one is the design of a very efficient algorithm based on an “Overlap-Layout-Consensus” (OLC) graph to assemble the long reads provided by 3rd generation technologies. The second concerns the analysis of this graph using algebraic topology concepts to determine, in advance, whether the assembly of the genome will be straightforward, i.e., whether it will lead to a pseudo-Hamiltonian path or cycle, or whether the results will need to be scrutinized. In the latter case, it will be necessary to look for “loops” in the OLC assembly graph caused by unresolved repeated genomic regions, and then try to untie the “knots” created by these regions.https://doi.org/10.2142/biophysico.16.0_444ngs technologiesolc assembly graphsgenomic repetitionshomology groupsbetti numbers
collection DOAJ
language English
format Article
sources DOAJ
author Jean-François Gibrat
spellingShingle Jean-François Gibrat
On the use of algebraic topology concepts to check the consistency of genome assembly
Biophysics and Physicobiology
ngs technologies
olc assembly graphs
genomic repetitions
homology groups
betti numbers
author_facet Jean-François Gibrat
author_sort Jean-François Gibrat
title On the use of algebraic topology concepts to check the consistency of genome assembly
title_short On the use of algebraic topology concepts to check the consistency of genome assembly
title_full On the use of algebraic topology concepts to check the consistency of genome assembly
title_fullStr On the use of algebraic topology concepts to check the consistency of genome assembly
title_full_unstemmed On the use of algebraic topology concepts to check the consistency of genome assembly
title_sort on the use of algebraic topology concepts to check the consistency of genome assembly
publisher The Biophysical Society of Japan
series Biophysics and Physicobiology
issn 2189-4779
publishDate 2019-11-01
description This paper presents a preliminary work consisting of two contributions. The first one is the design of a very efficient algorithm based on an “Overlap-Layout-Consensus” (OLC) graph to assemble the long reads provided by 3rd generation technologies. The second concerns the analysis of this graph using algebraic topology concepts to determine, in advance, whether the assembly of the genome will be straightforward, i.e., whether it will lead to a pseudo-Hamiltonian path or cycle, or whether the results will need to be scrutinized. In the latter case, it will be necessary to look for “loops” in the OLC assembly graph caused by unresolved repeated genomic regions, and then try to untie the “knots” created by these regions.
topic ngs technologies
olc assembly graphs
genomic repetitions
homology groups
betti numbers
url https://doi.org/10.2142/biophysico.16.0_444
work_keys_str_mv AT jeanfrancoisgibrat ontheuseofalgebraictopologyconceptstochecktheconsistencyofgenomeassembly
_version_ 1724705600168787968