Robust Linear Trend Test for Low-Coverage Next-Generation Sequence Data Controlling for Covariates

Low-coverage next-generation sequencing experiments assisted by statistical methods are popular in a genetic association study. Next-generation sequencing experiments produce genotype data that include allele read counts and read depths. For low sequencing depths, the genotypes tend to be highly unc...

Full description

Bibliographic Details
Main Authors:	Jung Yeon Lee, Myeong-Kyu Kim, Wonkuk Kim
Format:	Article
Language:	English
Published:	MDPI AG 2020-02-01
Series:	Mathematics
Subjects:	allele read counts low-coverage mixture model next-generation sequencing sandwich variance estimator
Online Access:	https://www.mdpi.com/2227-7390/8/2/217

id	doaj-0564f4a42adc4c1ea894e6ee32f9694c
record_format	Article
spelling	doaj-0564f4a42adc4c1ea894e6ee32f9694c2020-11-25T02:33:37ZengMDPI AGMathematics2227-73902020-02-018221710.3390/math8020217math8020217Robust Linear Trend Test for Low-Coverage Next-Generation Sequence Data Controlling for CovariatesJung Yeon Lee0Myeong-Kyu Kim1Wonkuk Kim2Department of Psychiatry, New York University School of Medicine, New York, NY 10016, USADepartment of Neurology, Chonnam National University Medical School, Gwangju 61469, KoreaDepartment of Applied Statistics, Chung-Ang University, Seoul 06974, KoreaLow-coverage next-generation sequencing experiments assisted by statistical methods are popular in a genetic association study. Next-generation sequencing experiments produce genotype data that include allele read counts and read depths. For low sequencing depths, the genotypes tend to be highly uncertain; therefore, the uncertain genotypes are usually removed or imputed before performing a statistical analysis. It may result in the inflated type I error rate and in a loss of statistical power. In this paper, we propose a mixture-based penalized score association test adjusting for non-genetic covariates. The proposed score test statistic is based on a sandwich variance estimator so that it is robust under the model misspecification between the covariates and the latent genotypes. The proposed method takes advantage of not requiring either external imputation or elimination of uncertain genotypes. The results of our simulation study show that the type I error rates are well controlled and the proposed association test have reasonable statistical power. As an illustration, we apply our statistic to pharmacogenomics data for drug responsiveness among 400 epilepsy patients.https://www.mdpi.com/2227-7390/8/2/217allele read countslow-coveragemixture modelnext-generation sequencingsandwich variance estimator
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jung Yeon Lee Myeong-Kyu Kim Wonkuk Kim
spellingShingle	Jung Yeon Lee Myeong-Kyu Kim Wonkuk Kim Robust Linear Trend Test for Low-Coverage Next-Generation Sequence Data Controlling for Covariates Mathematics allele read counts low-coverage mixture model next-generation sequencing sandwich variance estimator
author_facet	Jung Yeon Lee Myeong-Kyu Kim Wonkuk Kim
author_sort	Jung Yeon Lee
title	Robust Linear Trend Test for Low-Coverage Next-Generation Sequence Data Controlling for Covariates
title_short	Robust Linear Trend Test for Low-Coverage Next-Generation Sequence Data Controlling for Covariates
title_full	Robust Linear Trend Test for Low-Coverage Next-Generation Sequence Data Controlling for Covariates
title_fullStr	Robust Linear Trend Test for Low-Coverage Next-Generation Sequence Data Controlling for Covariates
title_full_unstemmed	Robust Linear Trend Test for Low-Coverage Next-Generation Sequence Data Controlling for Covariates
title_sort	robust linear trend test for low-coverage next-generation sequence data controlling for covariates
publisher	MDPI AG
series	Mathematics
issn	2227-7390
publishDate	2020-02-01
description	Low-coverage next-generation sequencing experiments assisted by statistical methods are popular in a genetic association study. Next-generation sequencing experiments produce genotype data that include allele read counts and read depths. For low sequencing depths, the genotypes tend to be highly uncertain; therefore, the uncertain genotypes are usually removed or imputed before performing a statistical analysis. It may result in the inflated type I error rate and in a loss of statistical power. In this paper, we propose a mixture-based penalized score association test adjusting for non-genetic covariates. The proposed score test statistic is based on a sandwich variance estimator so that it is robust under the model misspecification between the covariates and the latent genotypes. The proposed method takes advantage of not requiring either external imputation or elimination of uncertain genotypes. The results of our simulation study show that the type I error rates are well controlled and the proposed association test have reasonable statistical power. As an illustration, we apply our statistic to pharmacogenomics data for drug responsiveness among 400 epilepsy patients.
topic	allele read counts low-coverage mixture model next-generation sequencing sandwich variance estimator
url	https://www.mdpi.com/2227-7390/8/2/217
work_keys_str_mv	AT jungyeonlee robustlineartrendtestforlowcoveragenextgenerationsequencedatacontrollingforcovariates AT myeongkyukim robustlineartrendtestforlowcoveragenextgenerationsequencedatacontrollingforcovariates AT wonkukkim robustlineartrendtestforlowcoveragenextgenerationsequencedatacontrollingforcovariates
_version_	1724812722690850816

Robust Linear Trend Test for Low-Coverage Next-Generation Sequence Data Controlling for Covariates

Similar Items