Summary: | 博士 === 國立臺灣大學 === 臨床醫學研究所 === 106 === Background and objective: Hepatocellular carcinoma (HCC) is a leading cancer in Taiwan with high prevalence of hepatitis. Despite effective treatments with tumor eradication, recurrence of is still high and is an important issue for patient treatment. Identification of patients who are at high risk for recurrence may provide more efficacious screening and detection of tumor recurrence.
The aim of this study was to establish a hospital-based HCC database and develop recurrence predictive models for HCC patients who received treatment. After establishment of the HCC database, a case series regarding to the rare HCC duodenal complication was performed to test the efficiency and accuracy of database.
Multiple sampling medical data with varied frequencies such as laboratory data may contain the following characteristics: longitudinal data, irregular measurements for a laboratory item, irregular measurements for some patient, and multiple parameters. The aim of this study was to propose a data processing method to handle data with these characteristics and its application to the clinical prediction model development.
Methods: In National Taiwan University Hospital, a HCC cancer registry database was established. The newly diagnosed early stage HCC patients from 2007 to 2009 (in 1st stage study) and to 2013 (in 2nd stage study) who received radiofrequency ablation (RFA) as 1st treatment were enrolled for study. In 1st stage study, five feature selection methods including genetic algorithm (GA), simulated annealing (SA) algorithm, random forests (RF) and hybrid methods (GA+RF and SA+RF) were utilized for selecting an important subset of features from a total of 16 clinical features. These feature selection methods were combined with support vector machine (SVM) for developing predictive models with better performance. Five-fold cross-validation was used to train and test SVM models. In 2nd stage study, a dynamic period slicing (DPS) method combined with quantitative temporal abstraction algorithm (DPSQTA algorithm) is proposed to process longitudinal, irregular and multiple parameters data. The DPSQTA and a baseline method are compared regarding to the performances of predictive models, including sensitivity, specificity, balanced accuracy (BAC), accuracy, positive predictive value (PPV), and negative predictive value (NPV).
Results: HCC cancer registry database was established and implemented with high accuracy query system, which provided a base for studies, for example of rare duodenal invasion of HCC as a study target. HCC post treatment recurrence predictive model could be developed by SVM with hybrid feature selection methods and 5-fold cross-validation. Averages of the sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and area under the ROC curve were 67%, 86%, 82%,69%, 90%, and 0.69, respectively in early stage study. With the help of DPSQTA in 2nd stage study, the DPSQTA increased the overall performance of established predictive model than the baseline method in sensitivity, BAC, accuracy, PPV and NPV, although not statistically significant.
Conclusions: Based on established cancer registry database, effective HCC post RFA recurrence predictive model was established by machine learning SVM. High-risk recurrent patients could be identified for close follow up of recurrence. By add-on DPSQTA, the longitudinal, irregular and multiple parameters data could be processed and predictive model accuracy might be improved.
|