id doaj-5d0c8a10e0e24f48994bc80dce181704
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author Vivian Viallon
Mathilde His
Sabina Rinaldi
Marie Breeur
Audrey Gicquiau
Bertrand Hemon
Kim Overvad
Anne Tjønneland
Agnetha Linn Rostgaard-Hansen
Joseph A. Rothwell
Lucie Lecuyer
Gianluca Severi
Rudolf Kaaks
Theron Johnson
Matthias B. Schulze
Domenico Palli
Claudia Agnoli
Salvatore Panico
Rosario Tumino
Fulvio Ricceri
W. M. Monique Verschuren
Peter Engelfriet
Charlotte Onland-Moret
Roel Vermeulen
Therese Haugdahl Nøst
Ilona Urbarova
Raul Zamora-Ros
Miguel Rodriguez-Barranco
Pilar Amiano
José Maria Huerta
Eva Ardanaz
Olle Melander
Filip Ottoson
Linda Vidman
Matilda Rentoft
Julie A. Schmidt
Ruth C. Travis
Elisabete Weiderpass
Mattias Johansson
Laure Dossus
Mazda Jenab
Marc J. Gunter
Justo Lorenzo Bermejo
Dominique Scherer
Reza M. Salek
Pekka Keski-Rahkonen
Pietro Ferrari
spellingShingle Vivian Viallon
Mathilde His
Sabina Rinaldi
Marie Breeur
Audrey Gicquiau
Bertrand Hemon
Kim Overvad
Anne Tjønneland
Agnetha Linn Rostgaard-Hansen
Joseph A. Rothwell
Lucie Lecuyer
Gianluca Severi
Rudolf Kaaks
Theron Johnson
Matthias B. Schulze
Domenico Palli
Claudia Agnoli
Salvatore Panico
Rosario Tumino
Fulvio Ricceri
W. M. Monique Verschuren
Peter Engelfriet
Charlotte Onland-Moret
Roel Vermeulen
Therese Haugdahl Nøst
Ilona Urbarova
Raul Zamora-Ros
Miguel Rodriguez-Barranco
Pilar Amiano
José Maria Huerta
Eva Ardanaz
Olle Melander
Filip Ottoson
Linda Vidman
Matilda Rentoft
Julie A. Schmidt
Ruth C. Travis
Elisabete Weiderpass
Mattias Johansson
Laure Dossus
Mazda Jenab
Marc J. Gunter
Justo Lorenzo Bermejo
Dominique Scherer
Reza M. Salek
Pekka Keski-Rahkonen
Pietro Ferrari
A New Pipeline for the Normalization and Pooling of Metabolomics Data
Metabolites
cancer epidemiology
normalization
pooling
technical variability
metabolomics
metabolites
author_facet Vivian Viallon
Mathilde His
Sabina Rinaldi
Marie Breeur
Audrey Gicquiau
Bertrand Hemon
Kim Overvad
Anne Tjønneland
Agnetha Linn Rostgaard-Hansen
Joseph A. Rothwell
Lucie Lecuyer
Gianluca Severi
Rudolf Kaaks
Theron Johnson
Matthias B. Schulze
Domenico Palli
Claudia Agnoli
Salvatore Panico
Rosario Tumino
Fulvio Ricceri
W. M. Monique Verschuren
Peter Engelfriet
Charlotte Onland-Moret
Roel Vermeulen
Therese Haugdahl Nøst
Ilona Urbarova
Raul Zamora-Ros
Miguel Rodriguez-Barranco
Pilar Amiano
José Maria Huerta
Eva Ardanaz
Olle Melander
Filip Ottoson
Linda Vidman
Matilda Rentoft
Julie A. Schmidt
Ruth C. Travis
Elisabete Weiderpass
Mattias Johansson
Laure Dossus
Mazda Jenab
Marc J. Gunter
Justo Lorenzo Bermejo
Dominique Scherer
Reza M. Salek
Pekka Keski-Rahkonen
Pietro Ferrari
author_sort Vivian Viallon
title A New Pipeline for the Normalization and Pooling of Metabolomics Data
title_short A New Pipeline for the Normalization and Pooling of Metabolomics Data
title_full A New Pipeline for the Normalization and Pooling of Metabolomics Data
title_fullStr A New Pipeline for the Normalization and Pooling of Metabolomics Data
title_full_unstemmed A New Pipeline for the Normalization and Pooling of Metabolomics Data
title_sort new pipeline for the normalization and pooling of metabolomics data
publisher MDPI AG
series Metabolites
issn 2218-1989
publishDate 2021-09-01
description Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples’ originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.
topic cancer epidemiology
normalization
pooling
technical variability
metabolomics
metabolites
url https://www.mdpi.com/2218-1989/11/9/631
work_keys_str_mv AT vivianviallon anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT mathildehis anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT sabinarinaldi anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT mariebreeur anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT audreygicquiau anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT bertrandhemon anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT kimovervad anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT annetjønneland anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT agnethalinnrostgaardhansen anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT josepharothwell anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT lucielecuyer anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT gianlucaseveri anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rudolfkaaks anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT theronjohnson anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT matthiasbschulze anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT domenicopalli anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT claudiaagnoli anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT salvatorepanico anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rosariotumino anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT fulvioricceri anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT wmmoniqueverschuren anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT peterengelfriet anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT charlotteonlandmoret anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT roelvermeulen anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT theresehaugdahlnøst anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT ilonaurbarova anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT raulzamoraros anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT miguelrodriguezbarranco anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT pilaramiano anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT josemariahuerta anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT evaardanaz anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT ollemelander anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT filipottoson anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT lindavidman anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT matildarentoft anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT julieaschmidt anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT ruthctravis anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT elisabeteweiderpass anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT mattiasjohansson anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT lauredossus anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT mazdajenab anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT marcjgunter anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT justolorenzobermejo anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT dominiquescherer anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rezamsalek anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT pekkakeskirahkonen anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT pietroferrari anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT vivianviallon newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT mathildehis newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT sabinarinaldi newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT mariebreeur newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT audreygicquiau newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT bertrandhemon newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT kimovervad newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT annetjønneland newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT agnethalinnrostgaardhansen newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT josepharothwell newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT lucielecuyer newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT gianlucaseveri newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rudolfkaaks newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT theronjohnson newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT matthiasbschulze newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT domenicopalli newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT claudiaagnoli newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT salvatorepanico newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rosariotumino newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT fulvioricceri newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT wmmoniqueverschuren newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT peterengelfriet newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT charlotteonlandmoret newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT roelvermeulen newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT theresehaugdahlnøst newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT ilonaurbarova newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT raulzamoraros newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT miguelrodriguezbarranco newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT pilaramiano newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT josemariahuerta newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT evaardanaz newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT ollemelander newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT filipottoson newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT lindavidman newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT matildarentoft newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT julieaschmidt newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT ruthctravis newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT elisabeteweiderpass newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT mattiasjohansson newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT lauredossus newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT mazdajenab newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT marcjgunter newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT justolorenzobermejo newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT dominiquescherer newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rezamsalek newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT pekkakeskirahkonen newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT pietroferrari newpipelineforthenormalizationandpoolingofmetabolomicsdata
_version_ 1716870082322235392
spelling doaj-5d0c8a10e0e24f48994bc80dce1817042021-09-26T00:41:02ZengMDPI AGMetabolites2218-19892021-09-011163163110.3390/metabo11090631A New Pipeline for the Normalization and Pooling of Metabolomics DataVivian Viallon0Mathilde His1Sabina Rinaldi2Marie Breeur3Audrey Gicquiau4Bertrand Hemon5Kim Overvad6Anne Tjønneland7Agnetha Linn Rostgaard-Hansen8Joseph A. Rothwell9Lucie Lecuyer10Gianluca Severi11Rudolf Kaaks12Theron Johnson13Matthias B. Schulze14Domenico Palli15Claudia Agnoli16Salvatore Panico17Rosario Tumino18Fulvio Ricceri19W. M. Monique Verschuren20Peter Engelfriet21Charlotte Onland-Moret22Roel Vermeulen23Therese Haugdahl Nøst24Ilona Urbarova25Raul Zamora-Ros26Miguel Rodriguez-Barranco27Pilar Amiano28José Maria Huerta29Eva Ardanaz30Olle Melander31Filip Ottoson32Linda Vidman33Matilda Rentoft34Julie A. Schmidt35Ruth C. Travis36Elisabete Weiderpass37Mattias Johansson38Laure Dossus39Mazda Jenab40Marc J. Gunter41Justo Lorenzo Bermejo42Dominique Scherer43Reza M. Salek44Pekka Keski-Rahkonen45Pietro Ferrari46Nutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, FranceNutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, FranceNutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, FranceNutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, FranceNutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, FranceNutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, FranceDepartment of Public Health, Aarhus University Bartholins Alle 2, DK-8000 Aarhus, DenmarkDanish Cancer Society Research Center, DK-2100 Copenhagen, DenmarkDanish Cancer Society Research Center, DK-2100 Copenhagen, DenmarkUVSQ, Inserm, CESP U1018, “Exposome and Heredity” Team, Université Paris-Saclay, Gustave Roussy, 94800 Villejuif, FranceUVSQ, Inserm, CESP U1018, “Exposome and Heredity” Team, Université Paris-Saclay, Gustave Roussy, 94800 Villejuif, FranceUVSQ, Inserm, CESP U1018, “Exposome and Heredity” Team, Université Paris-Saclay, Gustave Roussy, 94800 Villejuif, FranceDivision of Cancer Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, GermanyDivision of Cancer Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, GermanyDepartment of Molecular Epidemiology, German Institute of Human Nutrition Potsdam Rehbruecke, Arthur-Scheunert-Allee 114-116, 14558 Nuthetal, GermanyCancer Risk Factors and Life-Style Epidemiology Unit, Institute for Cancer Research, Prevention and Clinical Network (ISPRO), 50139 Florence, ItalyEpidemiology and Prevention Unit Department of Research, Fondazione IRCCS—Istituto Nazionale dei Tumori, 20133 Milan, ItalyDipartimento di Medicina Clinica e Chirurgia, Federico II University, 80131 Naples, ItalyCancer Registry and Histopathology Department, Provincial Health Authority (ASP 7), 97100 Ragusa, ItalyDepartment of Clinical and Biological Sciences, University of Turin, 10043 Orbassano, ItalyNational Institute for Public Health and the Environment, Centre for Nutrition, Prevention and Health Services, Antonie van Leeuwenhoeklaan 9, 3721 MA Bilthoven, The NetherlandsNational Institute for Public Health and the Environment, Centre for Nutrition, Prevention and Health Services, Antonie van Leeuwenhoeklaan 9, 3721 MA Bilthoven, The NetherlandsJulius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3584 CG Utrecht, The NetherlandsJulius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3584 CG Utrecht, The NetherlandsDepartment of Community Medicine, Faculty of Health Sciences, UiT The Arctic University of Norway, P.O. Box 6050, 9037 Tromsø, NorwayDepartment of Community Medicine, Faculty of Health Sciences, UiT The Arctic University of Norway, P.O. Box 6050, 9037 Tromsø, NorwayUnit of Nutrition and Cancer, Cancer Epidemiology Research Programme, Catalan Institute of Oncology, Bellvitge Biomedical Research Institute (IDIBELL), 08908 L’Hospitalet de Llobregat, SpainEscuela Andaluza de Salud Pública (EASP), 18011 Granada, SpainCentro de Investigación Biomédica en Red de Epidemiología y Salud Pública (CIBERESP), 28029 Madrid, SpainCentro de Investigación Biomédica en Red de Epidemiología y Salud Pública (CIBERESP), 28029 Madrid, SpainCentro de Investigación Biomédica en Red de Epidemiología y Salud Pública (CIBERESP), 28029 Madrid, SpainDepartment of Clincal Sciences, Lund University, SE-21 428 Malmö, SwedenDepartment of Immunotechnology, Lund University, SE-22 100 Lund, SwedenDepartment of Radiation Sciences, Oncology, Umeå University, SE-901 87 Umeå, SwedenDepartment of Radiation Sciences, Oncology, Umeå University, SE-901 87 Umeå, SwedenCancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UKCancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UKInternational Agency for Research on Cancer, World Health Organization, 69008 Lyon, FranceGenomic Epidemiology Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, FranceNutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, FranceNutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, FranceNutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, FranceStatistical Genetics Group, Institute of Medical Biometry, University of Heidelberg, 69120 Heidelberg, GermanyStatistical Genetics Group, Institute of Medical Biometry, University of Heidelberg, 69120 Heidelberg, GermanyNutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, FranceNutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, FranceNutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, FrancePooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples’ originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.https://www.mdpi.com/2218-1989/11/9/631cancer epidemiologynormalizationpoolingtechnical variabilitymetabolomicsmetabolites