A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem

Searching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (D...

Full description

Bibliographic Details
Main Authors: Zhan Peng, Yuping Wang
Format: Article
Language:English
Published: Frontiers Media S.A. 2017-08-01
Series:Frontiers in Genetics
Subjects:
Online Access:http://journal.frontiersin.org/article/10.3389/fgene.2017.00104/full
id doaj-ec4a0ba01a974167a049589c1b2a30f9
record_format Article
spelling doaj-ec4a0ba01a974167a049589c1b2a30f92020-11-24T23:15:35ZengFrontiers Media S.A.Frontiers in Genetics1664-80212017-08-01810.3389/fgene.2017.00104274361A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) ProblemZhan PengYuping WangSearching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (DAG). However, the time and space efficiency of the leading dominant point graph based approaches is still unsatisfactory: constructing the dominated point graph used by these approaches requires a huge amount of time and space, which hinders the applications of these approaches to large-scale and long sequences. To address this issue, in this paper, we propose a new time and space efficient graph model called the Leveled-DAG for the MLCS problem. The Leveled-DAG can timely eliminate all the nodes in the graph that cannot contribute to the construction of MLCS during constructing. At any moment, only the current level and some previously generated nodes in the graph need to be kept in memory, which can greatly reduce the memory consumption. Also, the final graph contains only one node in which all of the wanted MLCS are saved, thus, no additional operations for searching the MLCS are needed. The experiments are conducted on real biological sequences with different numbers and lengths respectively, and the proposed algorithm is compared with three state-of-the-art algorithms. The experimental results show that the time and space needed for the Leveled-DAG approach are smaller than those for the compared algorithms especially on large-scale and long sequences.http://journal.frontiersin.org/article/10.3389/fgene.2017.00104/fullmultiple longest common subsequenceslongest common subsequencedominant point methoddirected acyclic graphbiological sequence alignment
collection DOAJ
language English
format Article
sources DOAJ
author Zhan Peng
Yuping Wang
spellingShingle Zhan Peng
Yuping Wang
A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
Frontiers in Genetics
multiple longest common subsequences
longest common subsequence
dominant point method
directed acyclic graph
biological sequence alignment
author_facet Zhan Peng
Yuping Wang
author_sort Zhan Peng
title A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_short A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_full A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_fullStr A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_full_unstemmed A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_sort novel efficient graph model for the multiple longest common subsequences (mlcs) problem
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2017-08-01
description Searching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (DAG). However, the time and space efficiency of the leading dominant point graph based approaches is still unsatisfactory: constructing the dominated point graph used by these approaches requires a huge amount of time and space, which hinders the applications of these approaches to large-scale and long sequences. To address this issue, in this paper, we propose a new time and space efficient graph model called the Leveled-DAG for the MLCS problem. The Leveled-DAG can timely eliminate all the nodes in the graph that cannot contribute to the construction of MLCS during constructing. At any moment, only the current level and some previously generated nodes in the graph need to be kept in memory, which can greatly reduce the memory consumption. Also, the final graph contains only one node in which all of the wanted MLCS are saved, thus, no additional operations for searching the MLCS are needed. The experiments are conducted on real biological sequences with different numbers and lengths respectively, and the proposed algorithm is compared with three state-of-the-art algorithms. The experimental results show that the time and space needed for the Leveled-DAG approach are smaller than those for the compared algorithms especially on large-scale and long sequences.
topic multiple longest common subsequences
longest common subsequence
dominant point method
directed acyclic graph
biological sequence alignment
url http://journal.frontiersin.org/article/10.3389/fgene.2017.00104/full
work_keys_str_mv AT zhanpeng anovelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem
AT yupingwang anovelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem
AT zhanpeng novelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem
AT yupingwang novelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem
_version_ 1725590420610613248