StrainFLAIR: strain-level profiling of metagenomic samples using variation graphs

Current studies are shifting from the use of single linear references to representation of multiple genomes organised in pangenome graphs or variation graphs. Meanwhile, in metagenomic samples, resolving strain-level abundances is a major step in microbiome studies, as associations between strain va...

Full description

Bibliographic Details
Main Authors: Kévin Da Silva, Nicolas Pons, Magali Berland, Florian Plaza Oñate, Mathieu Almeida, Pierre Peterlongo
Format: Article
Language:English
Published: PeerJ Inc. 2021-08-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/11884.pdf
id doaj-372d7ee034e448f196c6893f119d07f3
record_format Article
spelling doaj-372d7ee034e448f196c6893f119d07f32021-08-25T15:05:05ZengPeerJ Inc.PeerJ2167-83592021-08-019e1188410.7717/peerj.11884StrainFLAIR: strain-level profiling of metagenomic samples using variation graphsKévin Da Silva0Nicolas Pons1Magali Berland2Florian Plaza Oñate3Mathieu Almeida4Pierre Peterlongo5Université Paris-Saclay, INRAE, MGP, Jouy-en-Josas, FranceUniversité Paris-Saclay, INRAE, MGP, Jouy-en-Josas, FranceUniversité Paris-Saclay, INRAE, MGP, Jouy-en-Josas, FranceUniversité Paris-Saclay, INRAE, MGP, Jouy-en-Josas, FranceUniversité Paris-Saclay, INRAE, MGP, Jouy-en-Josas, FranceUniv Rennes, Inria, CNRS, IRISA—UMR 6074, Rennes, FranceCurrent studies are shifting from the use of single linear references to representation of multiple genomes organised in pangenome graphs or variation graphs. Meanwhile, in metagenomic samples, resolving strain-level abundances is a major step in microbiome studies, as associations between strain variants and phenotype are of great interest for diagnostic and therapeutic purposes. We developed StrainFLAIR with the aim of showing the feasibility of using variation graphs for indexing highly similar genomic sequences up to the strain level, and for characterizing a set of unknown sequenced genomes by querying this graph. On simulated data composed of mixtures of strains from the same bacterial species Escherichia coli, results show that StrainFLAIR was able to distinguish and estimate the abundances of close strains, as well as to highlight the presence of a new strain close to a referenced one and to estimate its abundance. On a real dataset composed of a mix of several bacterial species and several strains for the same species, results show that in a more complex configuration StrainFLAIR correctly estimates the abundance of each strain. Hence, results demonstrated how graph representation of multiple close genomes can be used as a reference to characterize a sample at the strain level.https://peerj.com/articles/11884.pdfMetagenomicsVariation graphsStrain-level abundancesRead mapping
collection DOAJ
language English
format Article
sources DOAJ
author Kévin Da Silva
Nicolas Pons
Magali Berland
Florian Plaza Oñate
Mathieu Almeida
Pierre Peterlongo
spellingShingle Kévin Da Silva
Nicolas Pons
Magali Berland
Florian Plaza Oñate
Mathieu Almeida
Pierre Peterlongo
StrainFLAIR: strain-level profiling of metagenomic samples using variation graphs
PeerJ
Metagenomics
Variation graphs
Strain-level abundances
Read mapping
author_facet Kévin Da Silva
Nicolas Pons
Magali Berland
Florian Plaza Oñate
Mathieu Almeida
Pierre Peterlongo
author_sort Kévin Da Silva
title StrainFLAIR: strain-level profiling of metagenomic samples using variation graphs
title_short StrainFLAIR: strain-level profiling of metagenomic samples using variation graphs
title_full StrainFLAIR: strain-level profiling of metagenomic samples using variation graphs
title_fullStr StrainFLAIR: strain-level profiling of metagenomic samples using variation graphs
title_full_unstemmed StrainFLAIR: strain-level profiling of metagenomic samples using variation graphs
title_sort strainflair: strain-level profiling of metagenomic samples using variation graphs
publisher PeerJ Inc.
series PeerJ
issn 2167-8359
publishDate 2021-08-01
description Current studies are shifting from the use of single linear references to representation of multiple genomes organised in pangenome graphs or variation graphs. Meanwhile, in metagenomic samples, resolving strain-level abundances is a major step in microbiome studies, as associations between strain variants and phenotype are of great interest for diagnostic and therapeutic purposes. We developed StrainFLAIR with the aim of showing the feasibility of using variation graphs for indexing highly similar genomic sequences up to the strain level, and for characterizing a set of unknown sequenced genomes by querying this graph. On simulated data composed of mixtures of strains from the same bacterial species Escherichia coli, results show that StrainFLAIR was able to distinguish and estimate the abundances of close strains, as well as to highlight the presence of a new strain close to a referenced one and to estimate its abundance. On a real dataset composed of a mix of several bacterial species and several strains for the same species, results show that in a more complex configuration StrainFLAIR correctly estimates the abundance of each strain. Hence, results demonstrated how graph representation of multiple close genomes can be used as a reference to characterize a sample at the strain level.
topic Metagenomics
Variation graphs
Strain-level abundances
Read mapping
url https://peerj.com/articles/11884.pdf
work_keys_str_mv AT kevindasilva strainflairstrainlevelprofilingofmetagenomicsamplesusingvariationgraphs
AT nicolaspons strainflairstrainlevelprofilingofmetagenomicsamplesusingvariationgraphs
AT magaliberland strainflairstrainlevelprofilingofmetagenomicsamplesusingvariationgraphs
AT florianplazaonate strainflairstrainlevelprofilingofmetagenomicsamplesusingvariationgraphs
AT mathieualmeida strainflairstrainlevelprofilingofmetagenomicsamplesusingvariationgraphs
AT pierrepeterlongo strainflairstrainlevelprofilingofmetagenomicsamplesusingvariationgraphs
_version_ 1721196401132568576