A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem

Searching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (D...

Full description

Bibliographic Details
Main Authors:	Zhan Peng, Yuping Wang
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2017-08-01
Series:	Frontiers in Genetics
Subjects:	multiple longest common subsequences longest common subsequence dominant point method directed acyclic graph biological sequence alignment
Online Access:	http://journal.frontiersin.org/article/10.3389/fgene.2017.00104/full

id	doaj-ec4a0ba01a974167a049589c1b2a30f9
record_format	Article
spelling	doaj-ec4a0ba01a974167a049589c1b2a30f92020-11-24T23:15:35ZengFrontiers Media S.A.Frontiers in Genetics1664-80212017-08-01810.3389/fgene.2017.00104274361A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) ProblemZhan PengYuping WangSearching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (DAG). However, the time and space efficiency of the leading dominant point graph based approaches is still unsatisfactory: constructing the dominated point graph used by these approaches requires a huge amount of time and space, which hinders the applications of these approaches to large-scale and long sequences. To address this issue, in this paper, we propose a new time and space efficient graph model called the Leveled-DAG for the MLCS problem. The Leveled-DAG can timely eliminate all the nodes in the graph that cannot contribute to the construction of MLCS during constructing. At any moment, only the current level and some previously generated nodes in the graph need to be kept in memory, which can greatly reduce the memory consumption. Also, the final graph contains only one node in which all of the wanted MLCS are saved, thus, no additional operations for searching the MLCS are needed. The experiments are conducted on real biological sequences with different numbers and lengths respectively, and the proposed algorithm is compared with three state-of-the-art algorithms. The experimental results show that the time and space needed for the Leveled-DAG approach are smaller than those for the compared algorithms especially on large-scale and long sequences.http://journal.frontiersin.org/article/10.3389/fgene.2017.00104/fullmultiple longest common subsequenceslongest common subsequencedominant point methoddirected acyclic graphbiological sequence alignment
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Zhan Peng Yuping Wang
spellingShingle	Zhan Peng Yuping Wang A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem Frontiers in Genetics multiple longest common subsequences longest common subsequence dominant point method directed acyclic graph biological sequence alignment
author_facet	Zhan Peng Yuping Wang
author_sort	Zhan Peng
title	A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_short	A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_full	A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_fullStr	A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_full_unstemmed	A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_sort	novel efficient graph model for the multiple longest common subsequences (mlcs) problem
publisher	Frontiers Media S.A.
series	Frontiers in Genetics
issn	1664-8021
publishDate	2017-08-01
description	Searching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (DAG). However, the time and space efficiency of the leading dominant point graph based approaches is still unsatisfactory: constructing the dominated point graph used by these approaches requires a huge amount of time and space, which hinders the applications of these approaches to large-scale and long sequences. To address this issue, in this paper, we propose a new time and space efficient graph model called the Leveled-DAG for the MLCS problem. The Leveled-DAG can timely eliminate all the nodes in the graph that cannot contribute to the construction of MLCS during constructing. At any moment, only the current level and some previously generated nodes in the graph need to be kept in memory, which can greatly reduce the memory consumption. Also, the final graph contains only one node in which all of the wanted MLCS are saved, thus, no additional operations for searching the MLCS are needed. The experiments are conducted on real biological sequences with different numbers and lengths respectively, and the proposed algorithm is compared with three state-of-the-art algorithms. The experimental results show that the time and space needed for the Leveled-DAG approach are smaller than those for the compared algorithms especially on large-scale and long sequences.
topic	multiple longest common subsequences longest common subsequence dominant point method directed acyclic graph biological sequence alignment
url	http://journal.frontiersin.org/article/10.3389/fgene.2017.00104/full
work_keys_str_mv	AT zhanpeng anovelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem AT yupingwang anovelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem AT zhanpeng novelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem AT yupingwang novelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem
_version_	1725590420610613248

A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem

Similar Items