Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data
The use of machine learning has increased over the years, especially in the world of molecular data. Generally, the inference of relationships between features is determined by statistical models. The phenotype (observable clinical characteristics) can result from the expression of the genotype (gen...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9205860/ |
id |
doaj-0ee3867318e24c79a90b0750465408a3 |
---|---|
record_format |
Article |
spelling |
doaj-0ee3867318e24c79a90b0750465408a32021-03-30T04:00:42ZengIEEEIEEE Access2169-35362020-01-01817512517513910.1109/ACCESS.2020.30263159205860Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular DataAleksander Placzek0https://orcid.org/0000-0002-2555-1058Alicja Pluciennik1Agnieszka Kotecka-Blicharz2https://orcid.org/0000-0002-8086-7346Michal Jarzab3Dariusz Mrozek4https://orcid.org/0000-0001-6764-6656Department of Research and Development, WASKO S. A., Gliwice, PolandDepartment of Research and Development, WASKO S. A., Gliwice, PolandDepartment of Nuclear Medicine and Endocrine Oncology, Maria Skłodowska-Curie National Research Institute of Oncology, Gliwice, PolandBreast Cancer Unit, Maria Skłodowska-Curie National Research Institute of Oncology, Gliwice, PolandDepartment of Applied Informatics, Silesian University of Technology, Gliwice, PolandThe use of machine learning has increased over the years, especially in the world of molecular data. Generally, the inference of relationships between features is determined by statistical models. The phenotype (observable clinical characteristics) can result from the expression of the genotype (genetic code) or environmental factors. Molecular datasets have limited information, while supporting clinical data is ambiguous. There are no well-established approaches for combining clinical information with genomic repositories. The genomic tests that are available only use molecular data and give physicians a result which can be integrated clinically. In this article, we present the strategy where clinical data, regardless of its limitations, is combined in one predictive model with molecular features. We predict the risk of malignancy in the thyroid nodules based on the results of fine-needle aspiration biopsy and expression of selected genes. We utilize a Bayesian network (BN) framework to discover relationships between molecular features and assess the impact of added clinical data quality on the performance of the chosen gene set. Bayesian network offering both prognostic and diagnostic perspectives is a perfect non-parametric technique for feature selection, feature extraction, and prediction purposes. We show that certain clinical factors could work as a synthetic feature and provide predictive abilities beyond what genes alone can offer. The experimental results demonstrate a higher performance of predictive models based on molecular and clinical data than when using only molecular data. We also explain why, one should consider the source of clinical data, but be aware of the quality of variables.https://ieeexplore.ieee.org/document/9205860/Bayesian networksfeature integrationsynthetic featuresMarkov blanketsQuality of featuresthyroid cancer |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Aleksander Placzek Alicja Pluciennik Agnieszka Kotecka-Blicharz Michal Jarzab Dariusz Mrozek |
spellingShingle |
Aleksander Placzek Alicja Pluciennik Agnieszka Kotecka-Blicharz Michal Jarzab Dariusz Mrozek Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data IEEE Access Bayesian networks feature integration synthetic features Markov blankets Quality of features thyroid cancer |
author_facet |
Aleksander Placzek Alicja Pluciennik Agnieszka Kotecka-Blicharz Michal Jarzab Dariusz Mrozek |
author_sort |
Aleksander Placzek |
title |
Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data |
title_short |
Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data |
title_full |
Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data |
title_fullStr |
Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data |
title_full_unstemmed |
Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data |
title_sort |
bayesian assessment of diagnostic strategy for a thyroid nodule involving a combination of clinical synthetic features and molecular data |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
The use of machine learning has increased over the years, especially in the world of molecular data. Generally, the inference of relationships between features is determined by statistical models. The phenotype (observable clinical characteristics) can result from the expression of the genotype (genetic code) or environmental factors. Molecular datasets have limited information, while supporting clinical data is ambiguous. There are no well-established approaches for combining clinical information with genomic repositories. The genomic tests that are available only use molecular data and give physicians a result which can be integrated clinically. In this article, we present the strategy where clinical data, regardless of its limitations, is combined in one predictive model with molecular features. We predict the risk of malignancy in the thyroid nodules based on the results of fine-needle aspiration biopsy and expression of selected genes. We utilize a Bayesian network (BN) framework to discover relationships between molecular features and assess the impact of added clinical data quality on the performance of the chosen gene set. Bayesian network offering both prognostic and diagnostic perspectives is a perfect non-parametric technique for feature selection, feature extraction, and prediction purposes. We show that certain clinical factors could work as a synthetic feature and provide predictive abilities beyond what genes alone can offer. The experimental results demonstrate a higher performance of predictive models based on molecular and clinical data than when using only molecular data. We also explain why, one should consider the source of clinical data, but be aware of the quality of variables. |
topic |
Bayesian networks feature integration synthetic features Markov blankets Quality of features thyroid cancer |
url |
https://ieeexplore.ieee.org/document/9205860/ |
work_keys_str_mv |
AT aleksanderplaczek bayesianassessmentofdiagnosticstrategyforathyroidnoduleinvolvingacombinationofclinicalsyntheticfeaturesandmoleculardata AT alicjapluciennik bayesianassessmentofdiagnosticstrategyforathyroidnoduleinvolvingacombinationofclinicalsyntheticfeaturesandmoleculardata AT agnieszkakoteckablicharz bayesianassessmentofdiagnosticstrategyforathyroidnoduleinvolvingacombinationofclinicalsyntheticfeaturesandmoleculardata AT michaljarzab bayesianassessmentofdiagnosticstrategyforathyroidnoduleinvolvingacombinationofclinicalsyntheticfeaturesandmoleculardata AT dariuszmrozek bayesianassessmentofdiagnosticstrategyforathyroidnoduleinvolvingacombinationofclinicalsyntheticfeaturesandmoleculardata |
_version_ |
1724182480630579200 |