A comprehensive evaluation of collapsing methods using simulated and real data: Excellent annotation of functionality and largesample sizes required

The advent of next generation sequencing (NGS) technologies enabled the investigation ofthe rare variant-common disease hypothesis in unrelated individuals, even on the genome-widelevel. Analysis of this hypothesis requires tailored statistical methods as single marker tests failon rare variants. An...

Full description

Bibliographic Details
Main Authors: Carmen eDering, Inke R König, Laura B Ramsey, Mary V Relling, Wenjian eYang, Andreas eZiegler
Format: Article
Language:English
Published: Frontiers Media S.A. 2014-09-01
Series:Frontiers in Genetics
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00323/full
id doaj-62a8b8ddfac54dd3a43a454cabd6c951
record_format Article
spelling doaj-62a8b8ddfac54dd3a43a454cabd6c9512020-11-25T01:22:38ZengFrontiers Media S.A.Frontiers in Genetics1664-80212014-09-01510.3389/fgene.2014.00323108564A comprehensive evaluation of collapsing methods using simulated and real data: Excellent annotation of functionality and largesample sizes requiredCarmen eDering0Inke R König1Laura B Ramsey2Mary V Relling3Wenjian eYang4Andreas eZiegler5Andreas eZiegler6Universität zu LübeckUniversität zu LübeckSt. Jude Children's Research HospitalSt. Jude Children's Research HospitalSt. Jude Children's Research HospitalUniversität zu LübeckUniversität zu LübeckThe advent of next generation sequencing (NGS) technologies enabled the investigation ofthe rare variant-common disease hypothesis in unrelated individuals, even on the genome-widelevel. Analysis of this hypothesis requires tailored statistical methods as single marker tests failon rare variants. An entire class of statistical methods collapses rare variants from a genomicregion of interest (ROI), thereby aggregating rare variants. In an extensive simulation study usingdata from the Genetic Analysis Workshop 17 we compared the performance of 15 collapsingmethods by means of a variety of pre-defined ROIs regarding minor allele frequency thresholdsand functionality. Findings of the simulation study were additionally confirmed by a real dataset investigating the association between methotrexate clearance and the SLCO1B1 gene inpatients with acute lymphoblastic leukemia. Our analyses showed substantially inflated type Ierror levels for many of the proposed collapsing methods. Only four of 15 approaches yieldedvalid type I errors in all considered scenarios. None of the statistical tests was able to detect trueassociations over a substantial proportion of replicates in the simulated data. Detailed annotationof functionality of variants is crucial to detect true associations. These findings were confirmedin the analysis of the real data. Recent theoretical work showed that large power is achievedin gene-bases analyses only if large sample sizes are available and a substantial proportion ofcausing rare variants is present in the gene-based analysis. Many of the investigated statisticalapproaches use permutation requiring high computational cost. There is a clear need for valid,powerful and fast to calculate test statistics for studies investigating rare variants.http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00323/fullcomparisonrare variantsSLCO1B1simulation studycollapsingburden test
collection DOAJ
language English
format Article
sources DOAJ
author Carmen eDering
Inke R König
Laura B Ramsey
Mary V Relling
Wenjian eYang
Andreas eZiegler
Andreas eZiegler
spellingShingle Carmen eDering
Inke R König
Laura B Ramsey
Mary V Relling
Wenjian eYang
Andreas eZiegler
Andreas eZiegler
A comprehensive evaluation of collapsing methods using simulated and real data: Excellent annotation of functionality and largesample sizes required
Frontiers in Genetics
comparison
rare variants
SLCO1B1
simulation study
collapsing
burden test
author_facet Carmen eDering
Inke R König
Laura B Ramsey
Mary V Relling
Wenjian eYang
Andreas eZiegler
Andreas eZiegler
author_sort Carmen eDering
title A comprehensive evaluation of collapsing methods using simulated and real data: Excellent annotation of functionality and largesample sizes required
title_short A comprehensive evaluation of collapsing methods using simulated and real data: Excellent annotation of functionality and largesample sizes required
title_full A comprehensive evaluation of collapsing methods using simulated and real data: Excellent annotation of functionality and largesample sizes required
title_fullStr A comprehensive evaluation of collapsing methods using simulated and real data: Excellent annotation of functionality and largesample sizes required
title_full_unstemmed A comprehensive evaluation of collapsing methods using simulated and real data: Excellent annotation of functionality and largesample sizes required
title_sort comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and largesample sizes required
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2014-09-01
description The advent of next generation sequencing (NGS) technologies enabled the investigation ofthe rare variant-common disease hypothesis in unrelated individuals, even on the genome-widelevel. Analysis of this hypothesis requires tailored statistical methods as single marker tests failon rare variants. An entire class of statistical methods collapses rare variants from a genomicregion of interest (ROI), thereby aggregating rare variants. In an extensive simulation study usingdata from the Genetic Analysis Workshop 17 we compared the performance of 15 collapsingmethods by means of a variety of pre-defined ROIs regarding minor allele frequency thresholdsand functionality. Findings of the simulation study were additionally confirmed by a real dataset investigating the association between methotrexate clearance and the SLCO1B1 gene inpatients with acute lymphoblastic leukemia. Our analyses showed substantially inflated type Ierror levels for many of the proposed collapsing methods. Only four of 15 approaches yieldedvalid type I errors in all considered scenarios. None of the statistical tests was able to detect trueassociations over a substantial proportion of replicates in the simulated data. Detailed annotationof functionality of variants is crucial to detect true associations. These findings were confirmedin the analysis of the real data. Recent theoretical work showed that large power is achievedin gene-bases analyses only if large sample sizes are available and a substantial proportion ofcausing rare variants is present in the gene-based analysis. Many of the investigated statisticalapproaches use permutation requiring high computational cost. There is a clear need for valid,powerful and fast to calculate test statistics for studies investigating rare variants.
topic comparison
rare variants
SLCO1B1
simulation study
collapsing
burden test
url http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00323/full
work_keys_str_mv AT carmenedering acomprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT inkerkonig acomprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT laurabramsey acomprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT maryvrelling acomprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT wenjianeyang acomprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT andreaseziegler acomprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT andreaseziegler acomprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT carmenedering comprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT inkerkonig comprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT laurabramsey comprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT maryvrelling comprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT wenjianeyang comprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT andreaseziegler comprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT andreaseziegler comprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
_version_ 1725126268797583360