Equivalent indels--ambiguous functional classes and redundancy in databases.

There is considerable interest in studying sequenced variations. However, while the positions of substitutions are uniquely identifiable by sequence alignment, the location of insertions and deletions still poses problems. Each insertion and deletion causes a change of sequence. Yet, due to low comp...

Full description

Bibliographic Details
Main Authors: Jens Assmus, Jürgen Kleffe, Armin O Schmitt, Gudrun A Brockmann
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3642179?pdf=render
id doaj-e60e20646e3441cdadfcc163d4cd6527
record_format Article
spelling doaj-e60e20646e3441cdadfcc163d4cd65272020-11-25T01:24:02ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0185e6280310.1371/journal.pone.0062803Equivalent indels--ambiguous functional classes and redundancy in databases.Jens AssmusJürgen KleffeArmin O SchmittGudrun A BrockmannThere is considerable interest in studying sequenced variations. However, while the positions of substitutions are uniquely identifiable by sequence alignment, the location of insertions and deletions still poses problems. Each insertion and deletion causes a change of sequence. Yet, due to low complexity or repetitive sequence structures, the same indel can sometimes be annotated in different ways. Two indels which differ in allele sequence and position can be one and the same, i.e. the alternative sequence of the whole chromosome is identical in both cases and, therefore, the two deletions are biologically equivalent. In such a case, it is impossible to identify the exact position of an indel merely based on sequence alignment. Thus, variation entries in a mutation database are not necessarily uniquely defined. We prove the existence of a contiguous region around an indel in which all deletions of the same length are biologically identical. Databases often show only one of several possible locations for a given variation. Furthermore, different data base entries can represent equivalent variation events. We identified 1,045,590 such problematic entries of insertions and deletions out of 5,860,408 indel entries in the current human database of Ensembl. Equivalent indels are found in sequence regions of different functions like exons, introns or 5' and 3' UTRs. One and the same variation can be assigned to several different functional classifications of which only one is correct. We implemented an algorithm that determines for each indel database entry its complete set of equivalent indels which is uniquely characterized by the indel itself and a given interval of the reference sequence.http://europepmc.org/articles/PMC3642179?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Jens Assmus
Jürgen Kleffe
Armin O Schmitt
Gudrun A Brockmann
spellingShingle Jens Assmus
Jürgen Kleffe
Armin O Schmitt
Gudrun A Brockmann
Equivalent indels--ambiguous functional classes and redundancy in databases.
PLoS ONE
author_facet Jens Assmus
Jürgen Kleffe
Armin O Schmitt
Gudrun A Brockmann
author_sort Jens Assmus
title Equivalent indels--ambiguous functional classes and redundancy in databases.
title_short Equivalent indels--ambiguous functional classes and redundancy in databases.
title_full Equivalent indels--ambiguous functional classes and redundancy in databases.
title_fullStr Equivalent indels--ambiguous functional classes and redundancy in databases.
title_full_unstemmed Equivalent indels--ambiguous functional classes and redundancy in databases.
title_sort equivalent indels--ambiguous functional classes and redundancy in databases.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2013-01-01
description There is considerable interest in studying sequenced variations. However, while the positions of substitutions are uniquely identifiable by sequence alignment, the location of insertions and deletions still poses problems. Each insertion and deletion causes a change of sequence. Yet, due to low complexity or repetitive sequence structures, the same indel can sometimes be annotated in different ways. Two indels which differ in allele sequence and position can be one and the same, i.e. the alternative sequence of the whole chromosome is identical in both cases and, therefore, the two deletions are biologically equivalent. In such a case, it is impossible to identify the exact position of an indel merely based on sequence alignment. Thus, variation entries in a mutation database are not necessarily uniquely defined. We prove the existence of a contiguous region around an indel in which all deletions of the same length are biologically identical. Databases often show only one of several possible locations for a given variation. Furthermore, different data base entries can represent equivalent variation events. We identified 1,045,590 such problematic entries of insertions and deletions out of 5,860,408 indel entries in the current human database of Ensembl. Equivalent indels are found in sequence regions of different functions like exons, introns or 5' and 3' UTRs. One and the same variation can be assigned to several different functional classifications of which only one is correct. We implemented an algorithm that determines for each indel database entry its complete set of equivalent indels which is uniquely characterized by the indel itself and a given interval of the reference sequence.
url http://europepmc.org/articles/PMC3642179?pdf=render
work_keys_str_mv AT jensassmus equivalentindelsambiguousfunctionalclassesandredundancyindatabases
AT jurgenkleffe equivalentindelsambiguousfunctionalclassesandredundancyindatabases
AT arminoschmitt equivalentindelsambiguousfunctionalclassesandredundancyindatabases
AT gudrunabrockmann equivalentindelsambiguousfunctionalclassesandredundancyindatabases
_version_ 1725119246384496640