Zero problems with compositional data of physical behaviors: a comparison of three zero replacement methods

Abstract Background Researchers applying compositional data analysis to time-use data (e.g., time spent in physical behaviors) often face the problem of zeros, that is, recordings of zero time spent in any of the studied behaviors. Zeros hinder the application of compositional data analysis because...

Full description

Bibliographic Details
Main Authors: Charlotte Lund Rasmussen, Javier Palarea-Albaladejo, Melker Staffan Johansson, Patrick Crowley, Matthew Leigh Stevens, Nidhi Gupta, Kristina Karstad, Andreas Holtermann
Format: Article
Language:English
Published: BMC 2020-10-01
Series:International Journal of Behavioral Nutrition and Physical Activity
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12966-020-01029-z
id doaj-3d651efc4dff48ed9751af859603c5f1
record_format Article
spelling doaj-3d651efc4dff48ed9751af859603c5f12020-11-25T03:46:44ZengBMCInternational Journal of Behavioral Nutrition and Physical Activity1479-58682020-10-0117111010.1186/s12966-020-01029-zZero problems with compositional data of physical behaviors: a comparison of three zero replacement methodsCharlotte Lund Rasmussen0Javier Palarea-Albaladejo1Melker Staffan Johansson2Patrick Crowley3Matthew Leigh Stevens4Nidhi Gupta5Kristina Karstad6Andreas Holtermann7National Research Centre for the Working EnvironmentBiomathematics and Statistics ScotlandDepartment of Sports Science and Clinical Biomechanics, University of Southern DenmarkNational Research Centre for the Working EnvironmentNational Research Centre for the Working EnvironmentNational Research Centre for the Working EnvironmentNational Research Centre for the Working EnvironmentNational Research Centre for the Working EnvironmentAbstract Background Researchers applying compositional data analysis to time-use data (e.g., time spent in physical behaviors) often face the problem of zeros, that is, recordings of zero time spent in any of the studied behaviors. Zeros hinder the application of compositional data analysis because the analysis is based on log-ratios. One way to overcome this challenge is to replace the zeros with sensible small values. The aim of this study was to compare the performance of three existing replacement methods used within physical behavior time-use epidemiology: simple replacement, multiplicative replacement, and log-ratio expectation-maximization (lrEM) algorithm. Moreover, we assessed the consequence of choosing replacement values higher than the lowest observed value for a given behavior. Method Using a complete dataset based on accelerometer data from 1310 Danish adults as reference, multiple datasets were simulated across six scenarios of zeros (5–30% zeros in 5% increments). Moreover, four examples were produced based on real data, in which, 10 and 20% zeros were imposed and replaced using a replacement value of 0.5 min, 65% of the observation threshold, or an estimated value below the observation threshold. For the simulation study and the examples, the zeros were replaced using the three replacement methods and the degree of distortion introduced was assessed by comparison with the complete dataset. Results The lrEM method outperformed the other replacement methods as it had the smallest influence on the structure of relative variation of the datasets. Both the simple and multiplicative replacements introduced higher distortion, particularly in scenarios with more than 10% zeros; although the latter, like the lrEM, does preserve the ratios between behaviors with no zeros. The examples revealed that replacing zeros with a value higher than the observation threshold severely affected the structure of relative variation. Conclusions Given our findings, we encourage the use of replacement methods that preserve the relative structure of physical behavior data, as achieved by the multiplicative and lrEM replacements, and to avoid simple replacement. Moreover, we do not recommend replacing zeros with values higher than the lowest observed value for a behavior.http://link.springer.com/article/10.1186/s12966-020-01029-zPhysical activitySedentary timeCompositional data analysisMissing dataTime-use
collection DOAJ
language English
format Article
sources DOAJ
author Charlotte Lund Rasmussen
Javier Palarea-Albaladejo
Melker Staffan Johansson
Patrick Crowley
Matthew Leigh Stevens
Nidhi Gupta
Kristina Karstad
Andreas Holtermann
spellingShingle Charlotte Lund Rasmussen
Javier Palarea-Albaladejo
Melker Staffan Johansson
Patrick Crowley
Matthew Leigh Stevens
Nidhi Gupta
Kristina Karstad
Andreas Holtermann
Zero problems with compositional data of physical behaviors: a comparison of three zero replacement methods
International Journal of Behavioral Nutrition and Physical Activity
Physical activity
Sedentary time
Compositional data analysis
Missing data
Time-use
author_facet Charlotte Lund Rasmussen
Javier Palarea-Albaladejo
Melker Staffan Johansson
Patrick Crowley
Matthew Leigh Stevens
Nidhi Gupta
Kristina Karstad
Andreas Holtermann
author_sort Charlotte Lund Rasmussen
title Zero problems with compositional data of physical behaviors: a comparison of three zero replacement methods
title_short Zero problems with compositional data of physical behaviors: a comparison of three zero replacement methods
title_full Zero problems with compositional data of physical behaviors: a comparison of three zero replacement methods
title_fullStr Zero problems with compositional data of physical behaviors: a comparison of three zero replacement methods
title_full_unstemmed Zero problems with compositional data of physical behaviors: a comparison of three zero replacement methods
title_sort zero problems with compositional data of physical behaviors: a comparison of three zero replacement methods
publisher BMC
series International Journal of Behavioral Nutrition and Physical Activity
issn 1479-5868
publishDate 2020-10-01
description Abstract Background Researchers applying compositional data analysis to time-use data (e.g., time spent in physical behaviors) often face the problem of zeros, that is, recordings of zero time spent in any of the studied behaviors. Zeros hinder the application of compositional data analysis because the analysis is based on log-ratios. One way to overcome this challenge is to replace the zeros with sensible small values. The aim of this study was to compare the performance of three existing replacement methods used within physical behavior time-use epidemiology: simple replacement, multiplicative replacement, and log-ratio expectation-maximization (lrEM) algorithm. Moreover, we assessed the consequence of choosing replacement values higher than the lowest observed value for a given behavior. Method Using a complete dataset based on accelerometer data from 1310 Danish adults as reference, multiple datasets were simulated across six scenarios of zeros (5–30% zeros in 5% increments). Moreover, four examples were produced based on real data, in which, 10 and 20% zeros were imposed and replaced using a replacement value of 0.5 min, 65% of the observation threshold, or an estimated value below the observation threshold. For the simulation study and the examples, the zeros were replaced using the three replacement methods and the degree of distortion introduced was assessed by comparison with the complete dataset. Results The lrEM method outperformed the other replacement methods as it had the smallest influence on the structure of relative variation of the datasets. Both the simple and multiplicative replacements introduced higher distortion, particularly in scenarios with more than 10% zeros; although the latter, like the lrEM, does preserve the ratios between behaviors with no zeros. The examples revealed that replacing zeros with a value higher than the observation threshold severely affected the structure of relative variation. Conclusions Given our findings, we encourage the use of replacement methods that preserve the relative structure of physical behavior data, as achieved by the multiplicative and lrEM replacements, and to avoid simple replacement. Moreover, we do not recommend replacing zeros with values higher than the lowest observed value for a behavior.
topic Physical activity
Sedentary time
Compositional data analysis
Missing data
Time-use
url http://link.springer.com/article/10.1186/s12966-020-01029-z
work_keys_str_mv AT charlottelundrasmussen zeroproblemswithcompositionaldataofphysicalbehaviorsacomparisonofthreezeroreplacementmethods
AT javierpalareaalbaladejo zeroproblemswithcompositionaldataofphysicalbehaviorsacomparisonofthreezeroreplacementmethods
AT melkerstaffanjohansson zeroproblemswithcompositionaldataofphysicalbehaviorsacomparisonofthreezeroreplacementmethods
AT patrickcrowley zeroproblemswithcompositionaldataofphysicalbehaviorsacomparisonofthreezeroreplacementmethods
AT matthewleighstevens zeroproblemswithcompositionaldataofphysicalbehaviorsacomparisonofthreezeroreplacementmethods
AT nidhigupta zeroproblemswithcompositionaldataofphysicalbehaviorsacomparisonofthreezeroreplacementmethods
AT kristinakarstad zeroproblemswithcompositionaldataofphysicalbehaviorsacomparisonofthreezeroreplacementmethods
AT andreasholtermann zeroproblemswithcompositionaldataofphysicalbehaviorsacomparisonofthreezeroreplacementmethods
_version_ 1724504475098415104