A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
Searching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (D...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2017-08-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | http://journal.frontiersin.org/article/10.3389/fgene.2017.00104/full |
id |
doaj-ec4a0ba01a974167a049589c1b2a30f9 |
---|---|
record_format |
Article |
spelling |
doaj-ec4a0ba01a974167a049589c1b2a30f92020-11-24T23:15:35ZengFrontiers Media S.A.Frontiers in Genetics1664-80212017-08-01810.3389/fgene.2017.00104274361A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) ProblemZhan PengYuping WangSearching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (DAG). However, the time and space efficiency of the leading dominant point graph based approaches is still unsatisfactory: constructing the dominated point graph used by these approaches requires a huge amount of time and space, which hinders the applications of these approaches to large-scale and long sequences. To address this issue, in this paper, we propose a new time and space efficient graph model called the Leveled-DAG for the MLCS problem. The Leveled-DAG can timely eliminate all the nodes in the graph that cannot contribute to the construction of MLCS during constructing. At any moment, only the current level and some previously generated nodes in the graph need to be kept in memory, which can greatly reduce the memory consumption. Also, the final graph contains only one node in which all of the wanted MLCS are saved, thus, no additional operations for searching the MLCS are needed. The experiments are conducted on real biological sequences with different numbers and lengths respectively, and the proposed algorithm is compared with three state-of-the-art algorithms. The experimental results show that the time and space needed for the Leveled-DAG approach are smaller than those for the compared algorithms especially on large-scale and long sequences.http://journal.frontiersin.org/article/10.3389/fgene.2017.00104/fullmultiple longest common subsequenceslongest common subsequencedominant point methoddirected acyclic graphbiological sequence alignment |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Zhan Peng Yuping Wang |
spellingShingle |
Zhan Peng Yuping Wang A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem Frontiers in Genetics multiple longest common subsequences longest common subsequence dominant point method directed acyclic graph biological sequence alignment |
author_facet |
Zhan Peng Yuping Wang |
author_sort |
Zhan Peng |
title |
A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem |
title_short |
A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem |
title_full |
A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem |
title_fullStr |
A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem |
title_full_unstemmed |
A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem |
title_sort |
novel efficient graph model for the multiple longest common subsequences (mlcs) problem |
publisher |
Frontiers Media S.A. |
series |
Frontiers in Genetics |
issn |
1664-8021 |
publishDate |
2017-08-01 |
description |
Searching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (DAG). However, the time and space efficiency of the leading dominant point graph based approaches is still unsatisfactory: constructing the dominated point graph used by these approaches requires a huge amount of time and space, which hinders the applications of these approaches to large-scale and long sequences. To address this issue, in this paper, we propose a new time and space efficient graph model called the Leveled-DAG for the MLCS problem. The Leveled-DAG can timely eliminate all the nodes in the graph that cannot contribute to the construction of MLCS during constructing. At any moment, only the current level and some previously generated nodes in the graph need to be kept in memory, which can greatly reduce the memory consumption. Also, the final graph contains only one node in which all of the wanted MLCS are saved, thus, no additional operations for searching the MLCS are needed. The experiments are conducted on real biological sequences with different numbers and lengths respectively, and the proposed algorithm is compared with three state-of-the-art algorithms. The experimental results show that the time and space needed for the Leveled-DAG approach are smaller than those for the compared algorithms especially on large-scale and long sequences. |
topic |
multiple longest common subsequences longest common subsequence dominant point method directed acyclic graph biological sequence alignment |
url |
http://journal.frontiersin.org/article/10.3389/fgene.2017.00104/full |
work_keys_str_mv |
AT zhanpeng anovelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem AT yupingwang anovelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem AT zhanpeng novelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem AT yupingwang novelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem |
_version_ |
1725590420610613248 |