High resolution measurement of DUF1220 domain copy number from whole genome sequence data

Abstract Background DUF1220 protein domains found primarily in Neuroblastoma BreakPoint Family (NBPF) genes show the greatest human lineage-specific increase in copy number of any coding region in the genome. There are 302 haploid copies of DUF1220 in hg38 (~160 of which are human-specific) and the...

Full description

Bibliographic Details
Main Authors: David P. Astling, Ilea E. Heft, Kenneth L. Jones, James M. Sikela
Format: Article
Language:English
Published: BMC 2017-08-01
Series:BMC Genomics
Subjects:
CNV
Online Access:http://link.springer.com/article/10.1186/s12864-017-3976-z
id doaj-ab922f62a8b24d3988ac8a7ca0e6c4b1
record_format Article
spelling doaj-ab922f62a8b24d3988ac8a7ca0e6c4b12020-11-24T21:11:45ZengBMCBMC Genomics1471-21642017-08-0118111610.1186/s12864-017-3976-zHigh resolution measurement of DUF1220 domain copy number from whole genome sequence dataDavid P. Astling0Ilea E. Heft1Kenneth L. Jones2James M. Sikela3Department of Biochemistry and Molecular Genetics, University of Colorado School of MedicineDepartment of Biochemistry and Molecular Genetics, University of Colorado School of MedicineDepartment of Pediatrics, University of Colorado School of MedicineDepartment of Biochemistry and Molecular Genetics, University of Colorado School of MedicineAbstract Background DUF1220 protein domains found primarily in Neuroblastoma BreakPoint Family (NBPF) genes show the greatest human lineage-specific increase in copy number of any coding region in the genome. There are 302 haploid copies of DUF1220 in hg38 (~160 of which are human-specific) and the majority of these can be divided into 6 different subtypes (referred to as clades). Copy number changes of specific DUF1220 clades have been associated in a dose-dependent manner with brain size variation (both evolutionarily and within the human population), cognitive aptitude, autism severity, and schizophrenia severity. However, no published methods can directly measure copies of DUF1220 with high accuracy and no method can distinguish between domains within a clade. Results Here we describe a novel method for measuring copies of DUF1220 domains and the NBPF genes in which they are found from whole genome sequence data. We have characterized the effect that various sequencing and alignment parameters and strategies have on the accuracy and precision of the method and defined the parameters that lead to optimal DUF1220 copy number measurement and resolution. We show that copy number estimates obtained using our read depth approach are highly correlated with those generated by ddPCR for three representative DUF1220 clades. By simulation, we demonstrate that our method provides sufficient resolution to analyze DUF1220 copy number variation at three levels: (1) DUF1220 clade copy number within individual genes and groups of genes (gene-specific clade groups) (2) genome wide DUF1220 clade copies and (3) gene copy number for DUF1220-encoding genes. Conclusions To our knowledge, this is the first method to accurately measure copies of all six DUF1220 clades and the first method to provide gene specific resolution of these clades. This allows one to discriminate among the ~300 haploid human DUF1220 copies to an extent not possible with any other method. The result is a greatly enhanced capability to analyze the role that these sequences play in human variation and disease.http://link.springer.com/article/10.1186/s12864-017-3976-zCopy number variationCNVDUF1220Genome informaticsNext-generation sequencingBioinformatics
collection DOAJ
language English
format Article
sources DOAJ
author David P. Astling
Ilea E. Heft
Kenneth L. Jones
James M. Sikela
spellingShingle David P. Astling
Ilea E. Heft
Kenneth L. Jones
James M. Sikela
High resolution measurement of DUF1220 domain copy number from whole genome sequence data
BMC Genomics
Copy number variation
CNV
DUF1220
Genome informatics
Next-generation sequencing
Bioinformatics
author_facet David P. Astling
Ilea E. Heft
Kenneth L. Jones
James M. Sikela
author_sort David P. Astling
title High resolution measurement of DUF1220 domain copy number from whole genome sequence data
title_short High resolution measurement of DUF1220 domain copy number from whole genome sequence data
title_full High resolution measurement of DUF1220 domain copy number from whole genome sequence data
title_fullStr High resolution measurement of DUF1220 domain copy number from whole genome sequence data
title_full_unstemmed High resolution measurement of DUF1220 domain copy number from whole genome sequence data
title_sort high resolution measurement of duf1220 domain copy number from whole genome sequence data
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2017-08-01
description Abstract Background DUF1220 protein domains found primarily in Neuroblastoma BreakPoint Family (NBPF) genes show the greatest human lineage-specific increase in copy number of any coding region in the genome. There are 302 haploid copies of DUF1220 in hg38 (~160 of which are human-specific) and the majority of these can be divided into 6 different subtypes (referred to as clades). Copy number changes of specific DUF1220 clades have been associated in a dose-dependent manner with brain size variation (both evolutionarily and within the human population), cognitive aptitude, autism severity, and schizophrenia severity. However, no published methods can directly measure copies of DUF1220 with high accuracy and no method can distinguish between domains within a clade. Results Here we describe a novel method for measuring copies of DUF1220 domains and the NBPF genes in which they are found from whole genome sequence data. We have characterized the effect that various sequencing and alignment parameters and strategies have on the accuracy and precision of the method and defined the parameters that lead to optimal DUF1220 copy number measurement and resolution. We show that copy number estimates obtained using our read depth approach are highly correlated with those generated by ddPCR for three representative DUF1220 clades. By simulation, we demonstrate that our method provides sufficient resolution to analyze DUF1220 copy number variation at three levels: (1) DUF1220 clade copy number within individual genes and groups of genes (gene-specific clade groups) (2) genome wide DUF1220 clade copies and (3) gene copy number for DUF1220-encoding genes. Conclusions To our knowledge, this is the first method to accurately measure copies of all six DUF1220 clades and the first method to provide gene specific resolution of these clades. This allows one to discriminate among the ~300 haploid human DUF1220 copies to an extent not possible with any other method. The result is a greatly enhanced capability to analyze the role that these sequences play in human variation and disease.
topic Copy number variation
CNV
DUF1220
Genome informatics
Next-generation sequencing
Bioinformatics
url http://link.springer.com/article/10.1186/s12864-017-3976-z
work_keys_str_mv AT davidpastling highresolutionmeasurementofduf1220domaincopynumberfromwholegenomesequencedata
AT ileaeheft highresolutionmeasurementofduf1220domaincopynumberfromwholegenomesequencedata
AT kennethljones highresolutionmeasurementofduf1220domaincopynumberfromwholegenomesequencedata
AT jamesmsikela highresolutionmeasurementofduf1220domaincopynumberfromwholegenomesequencedata
_version_ 1716752838284017664