Development of prognostic model for preterm birth using machine learning in a population-based cohort of Western Australia births between 1980 and 2015

Abstract Preterm birth is a global public health problem with a significant burden on the individuals affected. The study aimed to extend current research on preterm birth prognostic model development by developing and internally validating models using machine learning classification algorithms and...

Full description

Bibliographic Details
Published in:Scientific Reports
Main Authors: Kingsley Wong, Gizachew A. Tessema, Kevin Chai, Gavin Pereira
Format: Article
Language:English
Published: Nature Portfolio 2022-11-01
Online Access:https://doi.org/10.1038/s41598-022-23782-w
_version_ 1852671108660068352
author Kingsley Wong
Gizachew A. Tessema
Kevin Chai
Gavin Pereira
author_facet Kingsley Wong
Gizachew A. Tessema
Kevin Chai
Gavin Pereira
author_sort Kingsley Wong
collection DOAJ
container_title Scientific Reports
description Abstract Preterm birth is a global public health problem with a significant burden on the individuals affected. The study aimed to extend current research on preterm birth prognostic model development by developing and internally validating models using machine learning classification algorithms and population-based routinely collected data in Western Australia. The longitudinal retrospective cohort study involved all births in Western Australia between 1980 and 2015, and the analytic sample contains 81,974 (8.6%) preterm births (< 37 weeks of gestation). Prediction models for preterm birth were developed using regularised logistic regression, decision trees, Random Forests, extreme gradient boosting, and multi-layer perceptron (MLP). Predictors included maternal socio-demographics and medical conditions, current and past pregnancy complications, and family history. Class weight was applied to handle imbalanced outcomes and stratified tenfold cross-validation was used to reduce overfitting. Close to half of the preterm births (49.1% at 5% FPR, 95% CI 48.9%,49.5%) were correctly classified by the best performing classifier (MLP) for all women when current pregnancy information was available. The sensitivity was boosted to 52.7% (95% CI 52.1%,53.3%) after including past obstetric history in a sub-population of births from multiparous women. Around half of the preterm birth can be identified antenatally at high specificity using population-based routinely collected maternal and pregnancy data. The performance of the prediction models depends on the available predictor pool that is individual and time specific.
format Article
id doaj-art-48751cc7150f44feb77366e668e52aa4
institution Directory of Open Access Journals
issn 2045-2322
language English
publishDate 2022-11-01
publisher Nature Portfolio
record_format Article
spelling doaj-art-48751cc7150f44feb77366e668e52aa42025-08-19T21:33:17ZengNature PortfolioScientific Reports2045-23222022-11-0112111610.1038/s41598-022-23782-wDevelopment of prognostic model for preterm birth using machine learning in a population-based cohort of Western Australia births between 1980 and 2015Kingsley Wong0Gizachew A. Tessema1Kevin Chai2Gavin Pereira3Curtin School of Population Health, Curtin UniversityCurtin School of Population Health, Curtin UniversityCurtin School of Population Health, Curtin UniversityCurtin School of Population Health, Curtin UniversityAbstract Preterm birth is a global public health problem with a significant burden on the individuals affected. The study aimed to extend current research on preterm birth prognostic model development by developing and internally validating models using machine learning classification algorithms and population-based routinely collected data in Western Australia. The longitudinal retrospective cohort study involved all births in Western Australia between 1980 and 2015, and the analytic sample contains 81,974 (8.6%) preterm births (< 37 weeks of gestation). Prediction models for preterm birth were developed using regularised logistic regression, decision trees, Random Forests, extreme gradient boosting, and multi-layer perceptron (MLP). Predictors included maternal socio-demographics and medical conditions, current and past pregnancy complications, and family history. Class weight was applied to handle imbalanced outcomes and stratified tenfold cross-validation was used to reduce overfitting. Close to half of the preterm births (49.1% at 5% FPR, 95% CI 48.9%,49.5%) were correctly classified by the best performing classifier (MLP) for all women when current pregnancy information was available. The sensitivity was boosted to 52.7% (95% CI 52.1%,53.3%) after including past obstetric history in a sub-population of births from multiparous women. Around half of the preterm birth can be identified antenatally at high specificity using population-based routinely collected maternal and pregnancy data. The performance of the prediction models depends on the available predictor pool that is individual and time specific.https://doi.org/10.1038/s41598-022-23782-w
spellingShingle Kingsley Wong
Gizachew A. Tessema
Kevin Chai
Gavin Pereira
Development of prognostic model for preterm birth using machine learning in a population-based cohort of Western Australia births between 1980 and 2015
title Development of prognostic model for preterm birth using machine learning in a population-based cohort of Western Australia births between 1980 and 2015
title_full Development of prognostic model for preterm birth using machine learning in a population-based cohort of Western Australia births between 1980 and 2015
title_fullStr Development of prognostic model for preterm birth using machine learning in a population-based cohort of Western Australia births between 1980 and 2015
title_full_unstemmed Development of prognostic model for preterm birth using machine learning in a population-based cohort of Western Australia births between 1980 and 2015
title_short Development of prognostic model for preterm birth using machine learning in a population-based cohort of Western Australia births between 1980 and 2015
title_sort development of prognostic model for preterm birth using machine learning in a population based cohort of western australia births between 1980 and 2015
url https://doi.org/10.1038/s41598-022-23782-w
work_keys_str_mv AT kingsleywong developmentofprognosticmodelforpretermbirthusingmachinelearninginapopulationbasedcohortofwesternaustraliabirthsbetween1980and2015
AT gizachewatessema developmentofprognosticmodelforpretermbirthusingmachinelearninginapopulationbasedcohortofwesternaustraliabirthsbetween1980and2015
AT kevinchai developmentofprognosticmodelforpretermbirthusingmachinelearninginapopulationbasedcohortofwesternaustraliabirthsbetween1980and2015
AT gavinpereira developmentofprognosticmodelforpretermbirthusingmachinelearninginapopulationbasedcohortofwesternaustraliabirthsbetween1980and2015