ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw

Abstract Background Whole genome duplication (WGD) events are common in the evolutionary history of many living organisms. For decades, researchers have been trying to understand the genetic and epigenetic impact of WGD and its underlying molecular mechanisms. Particular attention was given to allop...

Full description

Bibliographic Details
Main Authors: Stefan Milosavljevic, Tony Kuo, Samuele Decarli, Lucas Mohn, Jun Sese, Kentaro K. Shimizu, Rie Shimizu-Inatsugi, Mark D. Robinson
Format: Article
Language:English
Published: BMC 2021-07-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-021-07845-2
id doaj-57db0961d7e8412db51573fefefac1bb
record_format Article
spelling doaj-57db0961d7e8412db51573fefefac1bb2021-07-18T11:30:08ZengBMCBMC Genomics1471-21642021-07-0122111210.1186/s12864-021-07845-2ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOwStefan Milosavljevic0Tony Kuo1Samuele Decarli2Lucas Mohn3Jun Sese4Kentaro K. Shimizu5Rie Shimizu-Inatsugi6Mark D. Robinson7Department of Evolutionary Biology and Environmental Studies, University of ZurichCentre for Biodiversity Genomics, University of GuelphDepartment of Computer Science, ETH ZurichDepartment of Evolutionary Biology and Environmental Studies, University of ZurichAIST Artificial Intelligence Research CenterDepartment of Evolutionary Biology and Environmental Studies, University of ZurichDepartment of Evolutionary Biology and Environmental Studies, University of ZurichSIB Swiss Institute of Bioinformatics, University of ZurichAbstract Background Whole genome duplication (WGD) events are common in the evolutionary history of many living organisms. For decades, researchers have been trying to understand the genetic and epigenetic impact of WGD and its underlying molecular mechanisms. Particular attention was given to allopolyploid study systems, species resulting from an hybridization event accompanied by WGD. Investigating the mechanisms behind the survival of a newly formed allopolyploid highlighted the key role of DNA methylation. With the improvement of high-throughput methods, such as whole genome bisulfite sequencing (WGBS), an opportunity opened to further understand the role of DNA methylation at a larger scale and higher resolution. However, only a few studies have applied WGBS to allopolyploids, which might be due to lack of genomic resources combined with a burdensome data analysis process. To overcome these problems, we developed the Automated Reproducible Polyploid EpiGenetic GuIdance workflOw (ARPEGGIO): the first workflow for the analysis of epigenetic data in polyploids. This workflow analyzes WGBS data from allopolyploid species via the genome assemblies of the allopolyploid’s parent species. ARPEGGIO utilizes an updated read classification algorithm (EAGLE-RC), to tackle the challenge of sequence similarity amongst parental genomes. ARPEGGIO offers automation, but more importantly, a complete set of analyses including spot checks starting from raw WGBS data: quality checks, trimming, alignment, methylation extraction, statistical analyses and downstream analyses. A full run of ARPEGGIO outputs a list of genes showing differential methylation. ARPEGGIO was made simple to set up, run and interpret, and its implementation ensures reproducibility by including both package management and containerization. Results We evaluated ARPEGGIO in two ways. First, we tested EAGLE-RC’s performance with publicly available datasets given a ground truth, and we show that EAGLE-RC decreases the error rate by 3 to 4 times compared to standard approaches. Second, using the same initial dataset, we show agreement between ARPEGGIO’s output and published results. Compared to other similar workflows, ARPEGGIO is the only one supporting polyploid data. Conclusions The goal of ARPEGGIO is to promote, support and improve polyploid research with a reproducible and automated set of analyses in a convenient implementation. ARPEGGIO is available at https://github.com/supermaxiste/ARPEGGIO .https://doi.org/10.1186/s12864-021-07845-2SnakemakeEpigeneticsBisulfite-sequencingPolyploidyAllopolyploidsReproducibility
collection DOAJ
language English
format Article
sources DOAJ
author Stefan Milosavljevic
Tony Kuo
Samuele Decarli
Lucas Mohn
Jun Sese
Kentaro K. Shimizu
Rie Shimizu-Inatsugi
Mark D. Robinson
spellingShingle Stefan Milosavljevic
Tony Kuo
Samuele Decarli
Lucas Mohn
Jun Sese
Kentaro K. Shimizu
Rie Shimizu-Inatsugi
Mark D. Robinson
ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw
BMC Genomics
Snakemake
Epigenetics
Bisulfite-sequencing
Polyploidy
Allopolyploids
Reproducibility
author_facet Stefan Milosavljevic
Tony Kuo
Samuele Decarli
Lucas Mohn
Jun Sese
Kentaro K. Shimizu
Rie Shimizu-Inatsugi
Mark D. Robinson
author_sort Stefan Milosavljevic
title ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw
title_short ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw
title_full ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw
title_fullStr ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw
title_full_unstemmed ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw
title_sort arpeggio: automated reproducible polyploid epigenetic guidance workflow
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2021-07-01
description Abstract Background Whole genome duplication (WGD) events are common in the evolutionary history of many living organisms. For decades, researchers have been trying to understand the genetic and epigenetic impact of WGD and its underlying molecular mechanisms. Particular attention was given to allopolyploid study systems, species resulting from an hybridization event accompanied by WGD. Investigating the mechanisms behind the survival of a newly formed allopolyploid highlighted the key role of DNA methylation. With the improvement of high-throughput methods, such as whole genome bisulfite sequencing (WGBS), an opportunity opened to further understand the role of DNA methylation at a larger scale and higher resolution. However, only a few studies have applied WGBS to allopolyploids, which might be due to lack of genomic resources combined with a burdensome data analysis process. To overcome these problems, we developed the Automated Reproducible Polyploid EpiGenetic GuIdance workflOw (ARPEGGIO): the first workflow for the analysis of epigenetic data in polyploids. This workflow analyzes WGBS data from allopolyploid species via the genome assemblies of the allopolyploid’s parent species. ARPEGGIO utilizes an updated read classification algorithm (EAGLE-RC), to tackle the challenge of sequence similarity amongst parental genomes. ARPEGGIO offers automation, but more importantly, a complete set of analyses including spot checks starting from raw WGBS data: quality checks, trimming, alignment, methylation extraction, statistical analyses and downstream analyses. A full run of ARPEGGIO outputs a list of genes showing differential methylation. ARPEGGIO was made simple to set up, run and interpret, and its implementation ensures reproducibility by including both package management and containerization. Results We evaluated ARPEGGIO in two ways. First, we tested EAGLE-RC’s performance with publicly available datasets given a ground truth, and we show that EAGLE-RC decreases the error rate by 3 to 4 times compared to standard approaches. Second, using the same initial dataset, we show agreement between ARPEGGIO’s output and published results. Compared to other similar workflows, ARPEGGIO is the only one supporting polyploid data. Conclusions The goal of ARPEGGIO is to promote, support and improve polyploid research with a reproducible and automated set of analyses in a convenient implementation. ARPEGGIO is available at https://github.com/supermaxiste/ARPEGGIO .
topic Snakemake
Epigenetics
Bisulfite-sequencing
Polyploidy
Allopolyploids
Reproducibility
url https://doi.org/10.1186/s12864-021-07845-2
work_keys_str_mv AT stefanmilosavljevic arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
AT tonykuo arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
AT samueledecarli arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
AT lucasmohn arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
AT junsese arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
AT kentarokshimizu arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
AT rieshimizuinatsugi arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
AT markdrobinson arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
_version_ 1721296111151349760