Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis

Abstract Background Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than tw...

Full description

Bibliographic Details
Main Authors:	Iris Eekhout, Mark A. van de Wiel, Martijn W. Heymans
Format:	Article
Language:	English
Published:	BMC 2017-08-01
Series:	BMC Medical Research Methodology
Subjects:	Multiple imputation Pooling Categorical covariates Significance test Logistic regression Simulation study
Online Access:	http://link.springer.com/article/10.1186/s12874-017-0404-7

id	doaj-c842923f1b2a4299a2180d015729decd
record_format	Article
spelling	doaj-c842923f1b2a4299a2180d015729decd2020-11-25T01:23:17ZengBMCBMC Medical Research Methodology1471-22882017-08-0117111210.1186/s12874-017-0404-7Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysisIris Eekhout0Mark A. van de Wiel1Martijn W. Heymans2Department of Epidemiology & Biostatistics, VU University Medical CenterDepartment of Epidemiology & Biostatistics, VU University Medical CenterDepartment of Epidemiology & Biostatistics, VU University Medical CenterAbstract Background Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels significantly contributes to the model, different methods are available. For example pooling chi-square tests with multiple degrees of freedom, pooling likelihood ratio test statistics, and pooling based on the covariance matrix of the regression model. These methods are more complex than RR and are not available in all mainstream statistical software packages. In addition, they do not always obtain optimal power levels. We argue that the median of the p-values from the overall significance tests from the analyses on the imputed datasets can be used as an alternative pooling rule for categorical variables. The aim of the current study is to compare different methods to test a categorical variable for significance after multiple imputation on applicability and power. Methods In a large simulation study, we demonstrated the control of the type I error and power levels of different pooling methods for categorical variables. Results This simulation study showed that for non-significant categorical covariates the type I error is controlled and the statistical power of the median pooling rule was at least equal to current multiple parameter tests. An empirical data example showed similar results. Conclusions It can therefore be concluded that using the median of the p-values from the imputed data analyses is an attractive and easy to use alternative method for significance testing of categorical variables.http://link.springer.com/article/10.1186/s12874-017-0404-7Multiple imputationPoolingCategorical covariatesSignificance testLogistic regressionSimulation study
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Iris Eekhout Mark A. van de Wiel Martijn W. Heymans
spellingShingle	Iris Eekhout Mark A. van de Wiel Martijn W. Heymans Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis BMC Medical Research Methodology Multiple imputation Pooling Categorical covariates Significance test Logistic regression Simulation study
author_facet	Iris Eekhout Mark A. van de Wiel Martijn W. Heymans
author_sort	Iris Eekhout
title	Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis
title_short	Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis
title_full	Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis
title_fullStr	Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis
title_full_unstemmed	Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis
title_sort	methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis
publisher	BMC
series	BMC Medical Research Methodology
issn	1471-2288
publishDate	2017-08-01
description	Abstract Background Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels significantly contributes to the model, different methods are available. For example pooling chi-square tests with multiple degrees of freedom, pooling likelihood ratio test statistics, and pooling based on the covariance matrix of the regression model. These methods are more complex than RR and are not available in all mainstream statistical software packages. In addition, they do not always obtain optimal power levels. We argue that the median of the p-values from the overall significance tests from the analyses on the imputed datasets can be used as an alternative pooling rule for categorical variables. The aim of the current study is to compare different methods to test a categorical variable for significance after multiple imputation on applicability and power. Methods In a large simulation study, we demonstrated the control of the type I error and power levels of different pooling methods for categorical variables. Results This simulation study showed that for non-significant categorical covariates the type I error is controlled and the statistical power of the median pooling rule was at least equal to current multiple parameter tests. An empirical data example showed similar results. Conclusions It can therefore be concluded that using the median of the p-values from the imputed data analyses is an attractive and easy to use alternative method for significance testing of categorical variables.
topic	Multiple imputation Pooling Categorical covariates Significance test Logistic regression Simulation study
url	http://link.springer.com/article/10.1186/s12874-017-0404-7
work_keys_str_mv	AT iriseekhout methodsforsignificancetestingofcategoricalcovariatesinlogisticregressionmodelsaftermultipleimputationpowerandapplicabilityanalysis AT markavandewiel methodsforsignificancetestingofcategoricalcovariatesinlogisticregressionmodelsaftermultipleimputationpowerandapplicabilityanalysis AT martijnwheymans methodsforsignificancetestingofcategoricalcovariatesinlogisticregressionmodelsaftermultipleimputationpowerandapplicabilityanalysis
_version_	1725123230123950080

Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis

Similar Items