Identification of copy number variants in whole-genome data using Reference Coverage Profiles

The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing and analyzing such large files is cumbersome, particu...

Full description

Bibliographic Details
Main Authors:	Gustavo eGlusman, Alissa eSeverson, Varsha eDhankani, Max eRobinson, Terry eFarrah, Denise E. Mauldin, Anna B. Stittrich, Seth A. Ament, Jared C. Roach, Mary E. Brunkow, Dale L. Bodian, Joseph G. Vockley, Ilya eShmulevich, John E. Niederhuber, Leroy eHood
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2015-02-01
Series:	Frontiers in Genetics
Subjects:	Signal processing structural variation Whole-genome sequencing clinical genomics depth of coverage
Online Access:	http://journal.frontiersin.org/Journal/10.3389/fgene.2015.00045/full

id	doaj-ee5ba32d4beb4615b54ca8b9ce4005a6
record_format	Article
spelling	doaj-ee5ba32d4beb4615b54ca8b9ce4005a62020-11-24T21:17:56ZengFrontiers Media S.A.Frontiers in Genetics1664-80212015-02-01610.3389/fgene.2015.00045128424Identification of copy number variants in whole-genome data using Reference Coverage ProfilesGustavo eGlusman0Alissa eSeverson1Varsha eDhankani2Max eRobinson3Terry eFarrah4Denise E. Mauldin5Anna B. Stittrich6Seth A. Ament7Jared C. Roach8Mary E. Brunkow9Dale L. Bodian10Joseph G. Vockley11Ilya eShmulevich12John E. Niederhuber13Leroy eHood14Institute for Systems BiologyInstitute for Systems BiologyInstitute for Systems BiologyInstitute for Systems BiologyInstitute for Systems BiologyInstitute for Systems BiologyInstitute for Systems BiologyInstitute for Systems BiologyInstitute for Systems BiologyInstitute for Systems BiologyInova Translational Medicine InstituteInova Translational Medicine InstituteInstitute for Systems BiologyInova Translational Medicine InstituteInstitute for Systems BiologyThe identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150-1000x compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1-100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes.We analyzed depth of coverage in over 6000 high quality (>40x) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes.Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation.http://journal.frontiersin.org/Journal/10.3389/fgene.2015.00045/fullSignal processingstructural variationWhole-genome sequencingclinical genomicsdepth of coverage
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Gustavo eGlusman Alissa eSeverson Varsha eDhankani Max eRobinson Terry eFarrah Denise E. Mauldin Anna B. Stittrich Seth A. Ament Jared C. Roach Mary E. Brunkow Dale L. Bodian Joseph G. Vockley Ilya eShmulevich John E. Niederhuber Leroy eHood
spellingShingle	Gustavo eGlusman Alissa eSeverson Varsha eDhankani Max eRobinson Terry eFarrah Denise E. Mauldin Anna B. Stittrich Seth A. Ament Jared C. Roach Mary E. Brunkow Dale L. Bodian Joseph G. Vockley Ilya eShmulevich John E. Niederhuber Leroy eHood Identification of copy number variants in whole-genome data using Reference Coverage Profiles Frontiers in Genetics Signal processing structural variation Whole-genome sequencing clinical genomics depth of coverage
author_facet	Gustavo eGlusman Alissa eSeverson Varsha eDhankani Max eRobinson Terry eFarrah Denise E. Mauldin Anna B. Stittrich Seth A. Ament Jared C. Roach Mary E. Brunkow Dale L. Bodian Joseph G. Vockley Ilya eShmulevich John E. Niederhuber Leroy eHood
author_sort	Gustavo eGlusman
title	Identification of copy number variants in whole-genome data using Reference Coverage Profiles
title_short	Identification of copy number variants in whole-genome data using Reference Coverage Profiles
title_full	Identification of copy number variants in whole-genome data using Reference Coverage Profiles
title_fullStr	Identification of copy number variants in whole-genome data using Reference Coverage Profiles
title_full_unstemmed	Identification of copy number variants in whole-genome data using Reference Coverage Profiles
title_sort	identification of copy number variants in whole-genome data using reference coverage profiles
publisher	Frontiers Media S.A.
series	Frontiers in Genetics
issn	1664-8021
publishDate	2015-02-01
description	The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150-1000x compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1-100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes.We analyzed depth of coverage in over 6000 high quality (>40x) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes.Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation.
topic	Signal processing structural variation Whole-genome sequencing clinical genomics depth of coverage
url	http://journal.frontiersin.org/Journal/10.3389/fgene.2015.00045/full
work_keys_str_mv	AT gustavoeglusman identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles AT alissaeseverson identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles AT varshaedhankani identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles AT maxerobinson identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles AT terryefarrah identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles AT deniseemauldin identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles AT annabstittrich identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles AT sethaament identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles AT jaredcroach identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles AT maryebrunkow identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles AT dalelbodian identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles AT josephgvockley identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles AT ilyaeshmulevich identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles AT johneniederhuber identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles AT leroyehood identificationofcopynumbervariantsinwholegenomedatausingreferencecoverageprofiles
_version_	1726011252959870976

Identification of copy number variants in whole-genome data using Reference Coverage Profiles

Similar Items