NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences

Abstract Background Comparing sets of sequences is a situation frequently encountered in bioinformatics, examples being comparing an assembly to a reference genome, or two genomes to each other. The purpose of the comparison is usually to find where the two sets differ, e.g. to find where a subseque...

Full description

Bibliographic Details
Main Authors: Ksenia Khelik, Karin Lagesen, Geir Kjetil Sandve, Torbjørn Rognes, Alexander Johan Nederbragt
Format: Article
Language:English
Published: BMC 2017-07-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-017-1748-z
id doaj-580559de4393483a97466cd687e6581f
record_format Article
spelling doaj-580559de4393483a97466cd687e6581f2020-11-25T00:44:00ZengBMCBMC Bioinformatics1471-21052017-07-0118111410.1186/s12859-017-1748-zNucDiff: in-depth characterization and annotation of differences between two sets of DNA sequencesKsenia Khelik0Karin Lagesen1Geir Kjetil Sandve2Torbjørn Rognes3Alexander Johan Nederbragt4Biomedical Informatics Research Group, Department of Informatics, University of OsloBiomedical Informatics Research Group, Department of Informatics, University of OsloBiomedical Informatics Research Group, Department of Informatics, University of OsloBiomedical Informatics Research Group, Department of Informatics, University of OsloBiomedical Informatics Research Group, Department of Informatics, University of OsloAbstract Background Comparing sets of sequences is a situation frequently encountered in bioinformatics, examples being comparing an assembly to a reference genome, or two genomes to each other. The purpose of the comparison is usually to find where the two sets differ, e.g. to find where a subsequence is repeated or deleted, or where insertions have been introduced. Such comparisons can be done using whole-genome alignments. Several tools for making such alignments exist, but none of them 1) provides detailed information about the types and locations of all differences between the two sets of sequences, 2) enables visualisation of alignment results at different levels of detail, and 3) carefully takes genomic repeats into consideration. Results We here present NucDiff, a tool aimed at locating and categorizing differences between two sets of closely related DNA sequences. NucDiff is able to deal with very fragmented genomes, repeated sequences, and various local differences and structural rearrangements. NucDiff determines differences by a rigorous analysis of alignment results obtained by the NUCmer, delta-filter and show-snps programs in the MUMmer sequence alignment package. All differences found are categorized according to a carefully defined classification scheme covering all possible differences between two sequences. Information about the differences is made available as GFF3 files, thus enabling visualisation using genome browsers as well as usage of the results as a component in an analysis pipeline. NucDiff was tested with varying parameters for the alignment step and compared with existing alternatives, called QUAST and dnadiff. Conclusions We have developed a whole genome alignment difference classification scheme together with the program NucDiff for finding such differences. The proposed classification scheme is comprehensive and can be used by other tools. NucDiff performs comparably to QUAST and dnadiff but gives much more detailed results that can easily be visualized. NucDiff is freely available on https://github.com/uio-cels/NucDiff under the MPL license.http://link.springer.com/article/10.1186/s12859-017-1748-zWhole-genome alignmentComparative analysisWhole-genome assemblyAnnotation of differences
collection DOAJ
language English
format Article
sources DOAJ
author Ksenia Khelik
Karin Lagesen
Geir Kjetil Sandve
Torbjørn Rognes
Alexander Johan Nederbragt
spellingShingle Ksenia Khelik
Karin Lagesen
Geir Kjetil Sandve
Torbjørn Rognes
Alexander Johan Nederbragt
NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences
BMC Bioinformatics
Whole-genome alignment
Comparative analysis
Whole-genome assembly
Annotation of differences
author_facet Ksenia Khelik
Karin Lagesen
Geir Kjetil Sandve
Torbjørn Rognes
Alexander Johan Nederbragt
author_sort Ksenia Khelik
title NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences
title_short NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences
title_full NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences
title_fullStr NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences
title_full_unstemmed NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences
title_sort nucdiff: in-depth characterization and annotation of differences between two sets of dna sequences
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2017-07-01
description Abstract Background Comparing sets of sequences is a situation frequently encountered in bioinformatics, examples being comparing an assembly to a reference genome, or two genomes to each other. The purpose of the comparison is usually to find where the two sets differ, e.g. to find where a subsequence is repeated or deleted, or where insertions have been introduced. Such comparisons can be done using whole-genome alignments. Several tools for making such alignments exist, but none of them 1) provides detailed information about the types and locations of all differences between the two sets of sequences, 2) enables visualisation of alignment results at different levels of detail, and 3) carefully takes genomic repeats into consideration. Results We here present NucDiff, a tool aimed at locating and categorizing differences between two sets of closely related DNA sequences. NucDiff is able to deal with very fragmented genomes, repeated sequences, and various local differences and structural rearrangements. NucDiff determines differences by a rigorous analysis of alignment results obtained by the NUCmer, delta-filter and show-snps programs in the MUMmer sequence alignment package. All differences found are categorized according to a carefully defined classification scheme covering all possible differences between two sequences. Information about the differences is made available as GFF3 files, thus enabling visualisation using genome browsers as well as usage of the results as a component in an analysis pipeline. NucDiff was tested with varying parameters for the alignment step and compared with existing alternatives, called QUAST and dnadiff. Conclusions We have developed a whole genome alignment difference classification scheme together with the program NucDiff for finding such differences. The proposed classification scheme is comprehensive and can be used by other tools. NucDiff performs comparably to QUAST and dnadiff but gives much more detailed results that can easily be visualized. NucDiff is freely available on https://github.com/uio-cels/NucDiff under the MPL license.
topic Whole-genome alignment
Comparative analysis
Whole-genome assembly
Annotation of differences
url http://link.springer.com/article/10.1186/s12859-017-1748-z
work_keys_str_mv AT kseniakhelik nucdiffindepthcharacterizationandannotationofdifferencesbetweentwosetsofdnasequences
AT karinlagesen nucdiffindepthcharacterizationandannotationofdifferencesbetweentwosetsofdnasequences
AT geirkjetilsandve nucdiffindepthcharacterizationandannotationofdifferencesbetweentwosetsofdnasequences
AT torbjørnrognes nucdiffindepthcharacterizationandannotationofdifferencesbetweentwosetsofdnasequences
AT alexanderjohannederbragt nucdiffindepthcharacterizationandannotationofdifferencesbetweentwosetsofdnasequences
_version_ 1725277037128581120