CONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets

Abstract Methylation datasets are affected by innumerable sources of variability, both biological (cell-type composition, genetics) and technical (batch effects). Here, we propose a reference-free method based on sparse canonical correlation analysis to separate the biological from technical sources...

Full description

Bibliographic Details
Main Authors: Mike Thompson, Zeyuan Johnson Chen, Elior Rahmani, Eran Halperin
Format: Article
Language:English
Published: BMC 2019-07-01
Series:Genome Biology
Online Access:http://link.springer.com/article/10.1186/s13059-019-1743-y
id doaj-1b388c7e19ea4d8abf70257cd24beae6
record_format Article
spelling doaj-1b388c7e19ea4d8abf70257cd24beae62020-11-25T03:44:42ZengBMCGenome Biology1474-760X2019-07-0120111510.1186/s13059-019-1743-yCONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasetsMike Thompson0Zeyuan Johnson Chen1Elior Rahmani2Eran Halperin3Department of Computer Science, University of California Los AngelesDepartment of Computer Science, University of California Los AngelesDepartment of Computer Science, University of California Los AngelesDepartment of Computer Science, University of California Los AngelesAbstract Methylation datasets are affected by innumerable sources of variability, both biological (cell-type composition, genetics) and technical (batch effects). Here, we propose a reference-free method based on sparse canonical correlation analysis to separate the biological from technical sources of variability. We show through simulations and real data that our method, CONFINED, is not only more accurate than the state-of-the-art reference-free methods for capturing known, replicable biological variability, but it is also considerably more robust to dataset-specific technical variability than previous approaches. CONFINED is available as an R package as detailed at https://github.com/cozygene/CONFINED .http://link.springer.com/article/10.1186/s13059-019-1743-y
collection DOAJ
language English
format Article
sources DOAJ
author Mike Thompson
Zeyuan Johnson Chen
Elior Rahmani
Eran Halperin
spellingShingle Mike Thompson
Zeyuan Johnson Chen
Elior Rahmani
Eran Halperin
CONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets
Genome Biology
author_facet Mike Thompson
Zeyuan Johnson Chen
Elior Rahmani
Eran Halperin
author_sort Mike Thompson
title CONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets
title_short CONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets
title_full CONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets
title_fullStr CONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets
title_full_unstemmed CONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets
title_sort confined: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets
publisher BMC
series Genome Biology
issn 1474-760X
publishDate 2019-07-01
description Abstract Methylation datasets are affected by innumerable sources of variability, both biological (cell-type composition, genetics) and technical (batch effects). Here, we propose a reference-free method based on sparse canonical correlation analysis to separate the biological from technical sources of variability. We show through simulations and real data that our method, CONFINED, is not only more accurate than the state-of-the-art reference-free methods for capturing known, replicable biological variability, but it is also considerably more robust to dataset-specific technical variability than previous approaches. CONFINED is available as an R package as detailed at https://github.com/cozygene/CONFINED .
url http://link.springer.com/article/10.1186/s13059-019-1743-y
work_keys_str_mv AT mikethompson confineddistinguishingbiologicalfromtechnicalsourcesofvariationbyleveragingmultiplemethylationdatasets
AT zeyuanjohnsonchen confineddistinguishingbiologicalfromtechnicalsourcesofvariationbyleveragingmultiplemethylationdatasets
AT eliorrahmani confineddistinguishingbiologicalfromtechnicalsourcesofvariationbyleveragingmultiplemethylationdatasets
AT eranhalperin confineddistinguishingbiologicalfromtechnicalsourcesofvariationbyleveragingmultiplemethylationdatasets
_version_ 1724513139228147712