knnAUC: an open-source R package for detecting nonlinear dependence between one continuous variable and one binary variable

Abstract Background Testing the dependence of two variables is one of the fundamental tasks in statistics. In this work, we developed an open-source R package (knnAUC) for detecting nonlinear dependence between one continuous variable X and one binary dependent variables Y (0 or 1). Results We addre...

Full description

Bibliographic Details
Main Authors: Yi Li, Xiaoyu Liu, Yanyun Ma, Yi Wang, Weichen Zhou, Meng Hao, Zhenghong Yuan, Jie Liu, Momiao Xiong, Yin Yao Shugart, Jiucun Wang, Li Jin
Format: Article
Language:English
Published: BMC 2018-11-01
Series:BMC Bioinformatics
Subjects:
AUC
Online Access:http://link.springer.com/article/10.1186/s12859-018-2427-4
id doaj-1002694f0f94455595d5ebdd1fb698e7
record_format Article
spelling doaj-1002694f0f94455595d5ebdd1fb698e72020-11-25T01:45:04ZengBMCBMC Bioinformatics1471-21052018-11-0119111210.1186/s12859-018-2427-4knnAUC: an open-source R package for detecting nonlinear dependence between one continuous variable and one binary variableYi Li0Xiaoyu Liu1Yanyun Ma2Yi Wang3Weichen Zhou4Meng Hao5Zhenghong Yuan6Jie Liu7Momiao Xiong8Yin Yao Shugart9Jiucun Wang10Li Jin11Ministry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan UniversityMinistry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan UniversityMinistry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan UniversityMinistry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan UniversityState Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan UniversityMinistry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan UniversityShanghai Public Health Clinical Center, Fudan UniversityKey Laboratory of Medical Molecular Virology of MOE/MOH, Shanghai Medical School, Fudan UniversityHuman Genetics Center, School of Public Health, University of Texas Houston Health Sciences CenterUnit on Statistical Genomics, Division of Intramural Division Programs, National Institute of Mental Health, National Institutes of HealthSix Industrial Research Institute, Fudan UniversitySix Industrial Research Institute, Fudan UniversityAbstract Background Testing the dependence of two variables is one of the fundamental tasks in statistics. In this work, we developed an open-source R package (knnAUC) for detecting nonlinear dependence between one continuous variable X and one binary dependent variables Y (0 or 1). Results We addressed this problem by using knnAUC (k-nearest neighbors AUC test, the R package is available at https://sourceforge.net/projects/knnauc/). In the knnAUC software framework, we first resampled a dataset to get the training and testing dataset according to the sample ratio (from 0 to 1), and then constructed a k-nearest neighbors algorithm classifier to get the yhat estimator (the probability of y = 1) of testy (the true label of testing dataset). Finally, we calculated the AUC (area under the curve of receiver operating characteristic) estimator and tested whether the AUC estimator is greater than 0.5. To evaluate the advantages of knnAUC compared to seven other popular methods, we performed extensive simulations to explore the relationships between eight different methods and compared the false positive rates and statistical power using both simulated and real datasets (Chronic hepatitis B datasets and kidney cancer RNA-seq datasets). Conclusions We concluded that knnAUC is an efficient R package to test non-linear dependence between one continuous variable and one binary dependent variable especially in computational biology area.http://link.springer.com/article/10.1186/s12859-018-2427-4Open sourceR packageNonlinear dependenceOne continuous variableOne binary dependent variableAUC
collection DOAJ
language English
format Article
sources DOAJ
author Yi Li
Xiaoyu Liu
Yanyun Ma
Yi Wang
Weichen Zhou
Meng Hao
Zhenghong Yuan
Jie Liu
Momiao Xiong
Yin Yao Shugart
Jiucun Wang
Li Jin
spellingShingle Yi Li
Xiaoyu Liu
Yanyun Ma
Yi Wang
Weichen Zhou
Meng Hao
Zhenghong Yuan
Jie Liu
Momiao Xiong
Yin Yao Shugart
Jiucun Wang
Li Jin
knnAUC: an open-source R package for detecting nonlinear dependence between one continuous variable and one binary variable
BMC Bioinformatics
Open source
R package
Nonlinear dependence
One continuous variable
One binary dependent variable
AUC
author_facet Yi Li
Xiaoyu Liu
Yanyun Ma
Yi Wang
Weichen Zhou
Meng Hao
Zhenghong Yuan
Jie Liu
Momiao Xiong
Yin Yao Shugart
Jiucun Wang
Li Jin
author_sort Yi Li
title knnAUC: an open-source R package for detecting nonlinear dependence between one continuous variable and one binary variable
title_short knnAUC: an open-source R package for detecting nonlinear dependence between one continuous variable and one binary variable
title_full knnAUC: an open-source R package for detecting nonlinear dependence between one continuous variable and one binary variable
title_fullStr knnAUC: an open-source R package for detecting nonlinear dependence between one continuous variable and one binary variable
title_full_unstemmed knnAUC: an open-source R package for detecting nonlinear dependence between one continuous variable and one binary variable
title_sort knnauc: an open-source r package for detecting nonlinear dependence between one continuous variable and one binary variable
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2018-11-01
description Abstract Background Testing the dependence of two variables is one of the fundamental tasks in statistics. In this work, we developed an open-source R package (knnAUC) for detecting nonlinear dependence between one continuous variable X and one binary dependent variables Y (0 or 1). Results We addressed this problem by using knnAUC (k-nearest neighbors AUC test, the R package is available at https://sourceforge.net/projects/knnauc/). In the knnAUC software framework, we first resampled a dataset to get the training and testing dataset according to the sample ratio (from 0 to 1), and then constructed a k-nearest neighbors algorithm classifier to get the yhat estimator (the probability of y = 1) of testy (the true label of testing dataset). Finally, we calculated the AUC (area under the curve of receiver operating characteristic) estimator and tested whether the AUC estimator is greater than 0.5. To evaluate the advantages of knnAUC compared to seven other popular methods, we performed extensive simulations to explore the relationships between eight different methods and compared the false positive rates and statistical power using both simulated and real datasets (Chronic hepatitis B datasets and kidney cancer RNA-seq datasets). Conclusions We concluded that knnAUC is an efficient R package to test non-linear dependence between one continuous variable and one binary dependent variable especially in computational biology area.
topic Open source
R package
Nonlinear dependence
One continuous variable
One binary dependent variable
AUC
url http://link.springer.com/article/10.1186/s12859-018-2427-4
work_keys_str_mv AT yili knnaucanopensourcerpackagefordetectingnonlineardependencebetweenonecontinuousvariableandonebinaryvariable
AT xiaoyuliu knnaucanopensourcerpackagefordetectingnonlineardependencebetweenonecontinuousvariableandonebinaryvariable
AT yanyunma knnaucanopensourcerpackagefordetectingnonlineardependencebetweenonecontinuousvariableandonebinaryvariable
AT yiwang knnaucanopensourcerpackagefordetectingnonlineardependencebetweenonecontinuousvariableandonebinaryvariable
AT weichenzhou knnaucanopensourcerpackagefordetectingnonlineardependencebetweenonecontinuousvariableandonebinaryvariable
AT menghao knnaucanopensourcerpackagefordetectingnonlineardependencebetweenonecontinuousvariableandonebinaryvariable
AT zhenghongyuan knnaucanopensourcerpackagefordetectingnonlineardependencebetweenonecontinuousvariableandonebinaryvariable
AT jieliu knnaucanopensourcerpackagefordetectingnonlineardependencebetweenonecontinuousvariableandonebinaryvariable
AT momiaoxiong knnaucanopensourcerpackagefordetectingnonlineardependencebetweenonecontinuousvariableandonebinaryvariable
AT yinyaoshugart knnaucanopensourcerpackagefordetectingnonlineardependencebetweenonecontinuousvariableandonebinaryvariable
AT jiucunwang knnaucanopensourcerpackagefordetectingnonlineardependencebetweenonecontinuousvariableandonebinaryvariable
AT lijin knnaucanopensourcerpackagefordetectingnonlineardependencebetweenonecontinuousvariableandonebinaryvariable
_version_ 1725025517598408704