The gene family-free median of three

Abstract Background The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, w...

Full description

Bibliographic Details
Main Authors: Daniel Doerr, Metin Balaban, Pedro Feijão, Cedric Chauve
Format: Article
Language:English
Published: BMC 2017-05-01
Series:Algorithms for Molecular Biology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13015-017-0106-z
Description
Summary:Abstract Background The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes. Methods We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of $$k \le 3$$ k ≤ 3 and present an ILP for its solution. However, for this problem, the computation of exact solutions remains intractable for sufficiently large instances. We then proceed to describe a heuristic method, FFAdj-AM, which performs well in practice. Results The developed methods compute accurate positional orthologs for genomes comparable in size of bacterial genomes on simulated data and genomic data acquired from the OMA orthology database. In particular, FFAdj-AM performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR. Conclusions We study the computational complexity of a new family-free model and present algorithms for its solution. With FFAdj-AM, we propose an appealing alternative to established tools for identifying higher confidence positional orthologs.
ISSN:1748-7188