Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data

The use of machine learning has increased over the years, especially in the world of molecular data. Generally, the inference of relationships between features is determined by statistical models. The phenotype (observable clinical characteristics) can result from the expression of the genotype (gen...

Full description

Bibliographic Details
Main Authors: Aleksander Placzek, Alicja Pluciennik, Agnieszka Kotecka-Blicharz, Michal Jarzab, Dariusz Mrozek
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9205860/
id doaj-0ee3867318e24c79a90b0750465408a3
record_format Article
spelling doaj-0ee3867318e24c79a90b0750465408a32021-03-30T04:00:42ZengIEEEIEEE Access2169-35362020-01-01817512517513910.1109/ACCESS.2020.30263159205860Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular DataAleksander Placzek0https://orcid.org/0000-0002-2555-1058Alicja Pluciennik1Agnieszka Kotecka-Blicharz2https://orcid.org/0000-0002-8086-7346Michal Jarzab3Dariusz Mrozek4https://orcid.org/0000-0001-6764-6656Department of Research and Development, WASKO S. A., Gliwice, PolandDepartment of Research and Development, WASKO S. A., Gliwice, PolandDepartment of Nuclear Medicine and Endocrine Oncology, Maria Skłodowska-Curie National Research Institute of Oncology, Gliwice, PolandBreast Cancer Unit, Maria Skłodowska-Curie National Research Institute of Oncology, Gliwice, PolandDepartment of Applied Informatics, Silesian University of Technology, Gliwice, PolandThe use of machine learning has increased over the years, especially in the world of molecular data. Generally, the inference of relationships between features is determined by statistical models. The phenotype (observable clinical characteristics) can result from the expression of the genotype (genetic code) or environmental factors. Molecular datasets have limited information, while supporting clinical data is ambiguous. There are no well-established approaches for combining clinical information with genomic repositories. The genomic tests that are available only use molecular data and give physicians a result which can be integrated clinically. In this article, we present the strategy where clinical data, regardless of its limitations, is combined in one predictive model with molecular features. We predict the risk of malignancy in the thyroid nodules based on the results of fine-needle aspiration biopsy and expression of selected genes. We utilize a Bayesian network (BN) framework to discover relationships between molecular features and assess the impact of added clinical data quality on the performance of the chosen gene set. Bayesian network offering both prognostic and diagnostic perspectives is a perfect non-parametric technique for feature selection, feature extraction, and prediction purposes. We show that certain clinical factors could work as a synthetic feature and provide predictive abilities beyond what genes alone can offer. The experimental results demonstrate a higher performance of predictive models based on molecular and clinical data than when using only molecular data. We also explain why, one should consider the source of clinical data, but be aware of the quality of variables.https://ieeexplore.ieee.org/document/9205860/Bayesian networksfeature integrationsynthetic featuresMarkov blanketsQuality of featuresthyroid cancer
collection DOAJ
language English
format Article
sources DOAJ
author Aleksander Placzek
Alicja Pluciennik
Agnieszka Kotecka-Blicharz
Michal Jarzab
Dariusz Mrozek
spellingShingle Aleksander Placzek
Alicja Pluciennik
Agnieszka Kotecka-Blicharz
Michal Jarzab
Dariusz Mrozek
Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data
IEEE Access
Bayesian networks
feature integration
synthetic features
Markov blankets
Quality of features
thyroid cancer
author_facet Aleksander Placzek
Alicja Pluciennik
Agnieszka Kotecka-Blicharz
Michal Jarzab
Dariusz Mrozek
author_sort Aleksander Placzek
title Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data
title_short Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data
title_full Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data
title_fullStr Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data
title_full_unstemmed Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data
title_sort bayesian assessment of diagnostic strategy for a thyroid nodule involving a combination of clinical synthetic features and molecular data
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description The use of machine learning has increased over the years, especially in the world of molecular data. Generally, the inference of relationships between features is determined by statistical models. The phenotype (observable clinical characteristics) can result from the expression of the genotype (genetic code) or environmental factors. Molecular datasets have limited information, while supporting clinical data is ambiguous. There are no well-established approaches for combining clinical information with genomic repositories. The genomic tests that are available only use molecular data and give physicians a result which can be integrated clinically. In this article, we present the strategy where clinical data, regardless of its limitations, is combined in one predictive model with molecular features. We predict the risk of malignancy in the thyroid nodules based on the results of fine-needle aspiration biopsy and expression of selected genes. We utilize a Bayesian network (BN) framework to discover relationships between molecular features and assess the impact of added clinical data quality on the performance of the chosen gene set. Bayesian network offering both prognostic and diagnostic perspectives is a perfect non-parametric technique for feature selection, feature extraction, and prediction purposes. We show that certain clinical factors could work as a synthetic feature and provide predictive abilities beyond what genes alone can offer. The experimental results demonstrate a higher performance of predictive models based on molecular and clinical data than when using only molecular data. We also explain why, one should consider the source of clinical data, but be aware of the quality of variables.
topic Bayesian networks
feature integration
synthetic features
Markov blankets
Quality of features
thyroid cancer
url https://ieeexplore.ieee.org/document/9205860/
work_keys_str_mv AT aleksanderplaczek bayesianassessmentofdiagnosticstrategyforathyroidnoduleinvolvingacombinationofclinicalsyntheticfeaturesandmoleculardata
AT alicjapluciennik bayesianassessmentofdiagnosticstrategyforathyroidnoduleinvolvingacombinationofclinicalsyntheticfeaturesandmoleculardata
AT agnieszkakoteckablicharz bayesianassessmentofdiagnosticstrategyforathyroidnoduleinvolvingacombinationofclinicalsyntheticfeaturesandmoleculardata
AT michaljarzab bayesianassessmentofdiagnosticstrategyforathyroidnoduleinvolvingacombinationofclinicalsyntheticfeaturesandmoleculardata
AT dariuszmrozek bayesianassessmentofdiagnosticstrategyforathyroidnoduleinvolvingacombinationofclinicalsyntheticfeaturesandmoleculardata
_version_ 1724182480630579200