A Study on Early Risk Assessment Techniques for Chronic Diseases by Mining Large-Scale Clinical Databases

博士 === 國立成功大學 === 資訊工程學系 === 107 === 〈p align=justify〉〈font face=Times New Roman〉 In recent years, the amount of electronic medical records (EMRs) has increased rapidly. Hence, obtaining valuable knowledge from EMRs to support medical decision making has become an important issue. To address this is...

Full description

Bibliographic Details
Main Authors: Chu-YuChin, 金聚鈺
Other Authors: Sun-Yuan Hsieh
Format: Others
Language:en_US
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/3wqqau
id ndltd-TW-107NCKU5392012
record_format oai_dc
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立成功大學 === 資訊工程學系 === 107 === 〈p align=justify〉〈font face=Times New Roman〉 In recent years, the amount of electronic medical records (EMRs) has increased rapidly. Hence, obtaining valuable knowledge from EMRs to support medical decision making has become an important issue. To address this issue, in the thesis, we propose a set of novel early risk assessment methods for different chronic diseases by identifying diverse disease risk factors from the National Health Insurance Research Database (NHIRD).  First, we propose a Disease Risk Association Pattern Mining Framework (DR-APM) to detect early risk for chronic diseases and rheumatoid arthritis was used as a case study. The main strategies of DR-APM include mining of disease risk pattern, associative classification, analysis with Risk Pattern Matching in PubMed (RPM-PubMed) and statistical analysis. The RPM-PubMed experiments show that the risk patterns discovered through DR-APM can be organized into well-known risk pattern type and potential novel risk pattern type. The experiments in statistical analysis reveal that there are significant differences in the disease categories of risk pattern distributions between the disease group and the control group. Based on the significant differences, DR-APM can achieve excellent accuracy in early risk assessment.  Second, in order to deal with the problem of a large number of disease coding attributes and the sparse matrix problem in EMR database, we propose an early Disease Risk Assessment with the Matrix factorization method (eDRAM) that fuses machine learning and matrix factorization to identify latent risk factors from the EMR database. eDRAM uses a non-negative matrix decomposition algorithm to significantly reduce the data dimension and reconstruct novel risk factors for early disease risk assessment. The experiments demonstrate that eDRAM can reduce a large number of attributes and maintain better efficiency, stability and effectiveness compared to other state-of-the-art methods.  Finally, in recent years, deep learning can achieve excellent performance in features recognition. However, the computational time-consuming and resource-intensive problems exist in the training model phase, especially dealing with large-scale attributes and data. To solve these problems to assess different types of diseases and improve accuracy, we propose an effective method called scalable Deep learning of Temporal generalized EHRs (sDT-EHRs). sDT-EHRs includes a novel temporal EHR representation model with an extraction algorithm, a random sampling method, and a deep residual convolutional neural network. To evaluate the effectiveness of sDT-EHRs for early risk assessment of multiple diseases, the following three chronic diseases: chronic obstructive pulmonary disease, systemic lupus erythematosus, and type 2 diabetes mellitus were assessed in the experiments, and sDT-EHRs was compared with state-of-the-art methods for early risk assessment of three chronic diseases via a large-scale nationwide medical database. Experimental evaluations of performance, scalability and applied to multiple chronic diseases yielded major three findings. First, this proposed EHR representation model is a combination of generalized disease codes that increase efficiency during the training phase. Second, sDT-EHRs outperforms other state-of-the-art methods during the risk assessment of the three chronic diseases. Finally, sDT-EHRs demonstrates good scalability to assess the diseases risk based on the disease models constructed from relatively small amounts of patient data and to maintain high performance when evaluating a large number of patients.  This research mainly considers the needs of modern precision medical treatment, and systematically investigates and develops a set of early disease risk assessment frameworks based on the data mining, machine learning and deep learning techniques. In order to use real-world large-scale medical data for the early risk assessments of different chronic diseases, we design a set of experiments to evaluate the improvement of the proposed method in terms of efficiency and effectiveness. The main contribution of this study is to discover a variety of novel risk factors and improve the early risk assessment methods, which can provide further medical validation analysis and assessment of different diseases to improve medical care.〈/font〉〈/p〉
author2 Sun-Yuan Hsieh
author_facet Sun-Yuan Hsieh
Chu-YuChin
金聚鈺
author Chu-YuChin
金聚鈺
spellingShingle Chu-YuChin
金聚鈺
A Study on Early Risk Assessment Techniques for Chronic Diseases by Mining Large-Scale Clinical Databases
author_sort Chu-YuChin
title A Study on Early Risk Assessment Techniques for Chronic Diseases by Mining Large-Scale Clinical Databases
title_short A Study on Early Risk Assessment Techniques for Chronic Diseases by Mining Large-Scale Clinical Databases
title_full A Study on Early Risk Assessment Techniques for Chronic Diseases by Mining Large-Scale Clinical Databases
title_fullStr A Study on Early Risk Assessment Techniques for Chronic Diseases by Mining Large-Scale Clinical Databases
title_full_unstemmed A Study on Early Risk Assessment Techniques for Chronic Diseases by Mining Large-Scale Clinical Databases
title_sort study on early risk assessment techniques for chronic diseases by mining large-scale clinical databases
publishDate 2019
url http://ndltd.ncl.edu.tw/handle/3wqqau
work_keys_str_mv AT chuyuchin astudyonearlyriskassessmenttechniquesforchronicdiseasesbymininglargescaleclinicaldatabases
AT jīnjùyù astudyonearlyriskassessmenttechniquesforchronicdiseasesbymininglargescaleclinicaldatabases
AT chuyuchin jīyútànkāndàguīmójiànkāngzīliàokùzhīmànxìngbìngzǎoqīfēngxiǎnpínggūjìshùyánjiū
AT jīnjùyù jīyútànkāndàguīmójiànkāngzīliàokùzhīmànxìngbìngzǎoqīfēngxiǎnpínggūjìshùyánjiū
AT chuyuchin studyonearlyriskassessmenttechniquesforchronicdiseasesbymininglargescaleclinicaldatabases
AT jīnjùyù studyonearlyriskassessmenttechniquesforchronicdiseasesbymininglargescaleclinicaldatabases
_version_ 1719277874091917312
spelling ndltd-TW-107NCKU53920122019-10-25T05:24:18Z http://ndltd.ncl.edu.tw/handle/3wqqau A Study on Early Risk Assessment Techniques for Chronic Diseases by Mining Large-Scale Clinical Databases 基於探勘大規模健康資料庫之慢性病早期風險評估技術研究 Chu-YuChin 金聚鈺 博士 國立成功大學 資訊工程學系 107 〈p align=justify〉〈font face=Times New Roman〉 In recent years, the amount of electronic medical records (EMRs) has increased rapidly. Hence, obtaining valuable knowledge from EMRs to support medical decision making has become an important issue. To address this issue, in the thesis, we propose a set of novel early risk assessment methods for different chronic diseases by identifying diverse disease risk factors from the National Health Insurance Research Database (NHIRD).  First, we propose a Disease Risk Association Pattern Mining Framework (DR-APM) to detect early risk for chronic diseases and rheumatoid arthritis was used as a case study. The main strategies of DR-APM include mining of disease risk pattern, associative classification, analysis with Risk Pattern Matching in PubMed (RPM-PubMed) and statistical analysis. The RPM-PubMed experiments show that the risk patterns discovered through DR-APM can be organized into well-known risk pattern type and potential novel risk pattern type. The experiments in statistical analysis reveal that there are significant differences in the disease categories of risk pattern distributions between the disease group and the control group. Based on the significant differences, DR-APM can achieve excellent accuracy in early risk assessment.  Second, in order to deal with the problem of a large number of disease coding attributes and the sparse matrix problem in EMR database, we propose an early Disease Risk Assessment with the Matrix factorization method (eDRAM) that fuses machine learning and matrix factorization to identify latent risk factors from the EMR database. eDRAM uses a non-negative matrix decomposition algorithm to significantly reduce the data dimension and reconstruct novel risk factors for early disease risk assessment. The experiments demonstrate that eDRAM can reduce a large number of attributes and maintain better efficiency, stability and effectiveness compared to other state-of-the-art methods.  Finally, in recent years, deep learning can achieve excellent performance in features recognition. However, the computational time-consuming and resource-intensive problems exist in the training model phase, especially dealing with large-scale attributes and data. To solve these problems to assess different types of diseases and improve accuracy, we propose an effective method called scalable Deep learning of Temporal generalized EHRs (sDT-EHRs). sDT-EHRs includes a novel temporal EHR representation model with an extraction algorithm, a random sampling method, and a deep residual convolutional neural network. To evaluate the effectiveness of sDT-EHRs for early risk assessment of multiple diseases, the following three chronic diseases: chronic obstructive pulmonary disease, systemic lupus erythematosus, and type 2 diabetes mellitus were assessed in the experiments, and sDT-EHRs was compared with state-of-the-art methods for early risk assessment of three chronic diseases via a large-scale nationwide medical database. Experimental evaluations of performance, scalability and applied to multiple chronic diseases yielded major three findings. First, this proposed EHR representation model is a combination of generalized disease codes that increase efficiency during the training phase. Second, sDT-EHRs outperforms other state-of-the-art methods during the risk assessment of the three chronic diseases. Finally, sDT-EHRs demonstrates good scalability to assess the diseases risk based on the disease models constructed from relatively small amounts of patient data and to maintain high performance when evaluating a large number of patients.  This research mainly considers the needs of modern precision medical treatment, and systematically investigates and develops a set of early disease risk assessment frameworks based on the data mining, machine learning and deep learning techniques. In order to use real-world large-scale medical data for the early risk assessments of different chronic diseases, we design a set of experiments to evaluate the improvement of the proposed method in terms of efficiency and effectiveness. The main contribution of this study is to discover a variety of novel risk factors and improve the early risk assessment methods, which can provide further medical validation analysis and assessment of different diseases to improve medical care.〈/font〉〈/p〉 Sun-Yuan Hsieh Vincent S. Tseng 謝孫源 曾新穆 2019 學位論文 ; thesis 95 en_US