Cost-Sensitive Multi-Label Classification with Applications

博士 === 國立臺灣大學 === 資訊工程學研究所 === 101 === We study a generalization of the traditional multi-label classification, which we refer to as cost-sensitive multi-label classification (CSML). In this problem, the misclassification cost can be different for each instance-label pair. For solving the problem, w...

Full description

Bibliographic Details
Main Authors: Hung-Yi Lo, 駱宏毅
Other Authors: 林守德
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/61015886145358618517
id ndltd-TW-101NTU05392009
record_format oai_dc
spelling ndltd-TW-101NTU053920092016-03-23T04:13:55Z http://ndltd.ncl.edu.tw/handle/61015886145358618517 Cost-Sensitive Multi-Label Classification with Applications 成本導向多標籤學習演算法與應用 Hung-Yi Lo 駱宏毅 博士 國立臺灣大學 資訊工程學研究所 101 We study a generalization of the traditional multi-label classification, which we refer to as cost-sensitive multi-label classification (CSML). In this problem, the misclassification cost can be different for each instance-label pair. For solving the problem, we propose two novel and general strategies based on the problem transformation technique. The proposed strategies transform the CSML problem to several cost-sensitive single-label classification problems. In addition, we propose a basis expansion model for CSML, which we call the Generalized k-Labelsets Ensemble (GLE). In the basis expansion model, a basis function is a label powerset classifier trained on a random k-labelset. The expansion coefficients are learned by minimizing the cost-weighted global error between the prediction and the ground truth. GLE can also be used for traditional multi-label classification. Experimental results on both multi-label classification and cost-sensitive multi-label classification demonstrate that our method has better performance than other methods. Cost-sensitive classification is based on the assumption that the cost is given according to the application. “Where does cost come from?” is an important practical issue. We study two real-world prediction tasks and link their data distribution to the cost information. The two tasks are medical image classification and social tag prediction. In medical image classification, we observe a patient-imbalanced phenomenon that has seriously hurt the generalization ability of the image classifier. We design several patient-balanced learning algorithms based on cost-sensitive binary classification. The success of our patient-balanced learning methods has been proved by winning KDD Cup 2008. For social tag prediction, we propose to treat the tag counts as the mis-classification costs and model the social tagging problem as a cost-sensitive multi-label classification problem. The experimental results in audio tag annotation and retrieval demonstrate that the CSML approaches outperform our winning method in Music Information Retrieval Evaluation eXchange (MIREX) 2009 in terms of both cost-sensitive and cost-less evaluation metrics. The results on social bookmark prediction also demonstrate that our proposed method has better performance than other methods. 林守德 2013 學位論文 ; thesis 91 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立臺灣大學 === 資訊工程學研究所 === 101 === We study a generalization of the traditional multi-label classification, which we refer to as cost-sensitive multi-label classification (CSML). In this problem, the misclassification cost can be different for each instance-label pair. For solving the problem, we propose two novel and general strategies based on the problem transformation technique. The proposed strategies transform the CSML problem to several cost-sensitive single-label classification problems. In addition, we propose a basis expansion model for CSML, which we call the Generalized k-Labelsets Ensemble (GLE). In the basis expansion model, a basis function is a label powerset classifier trained on a random k-labelset. The expansion coefficients are learned by minimizing the cost-weighted global error between the prediction and the ground truth. GLE can also be used for traditional multi-label classification. Experimental results on both multi-label classification and cost-sensitive multi-label classification demonstrate that our method has better performance than other methods. Cost-sensitive classification is based on the assumption that the cost is given according to the application. “Where does cost come from?” is an important practical issue. We study two real-world prediction tasks and link their data distribution to the cost information. The two tasks are medical image classification and social tag prediction. In medical image classification, we observe a patient-imbalanced phenomenon that has seriously hurt the generalization ability of the image classifier. We design several patient-balanced learning algorithms based on cost-sensitive binary classification. The success of our patient-balanced learning methods has been proved by winning KDD Cup 2008. For social tag prediction, we propose to treat the tag counts as the mis-classification costs and model the social tagging problem as a cost-sensitive multi-label classification problem. The experimental results in audio tag annotation and retrieval demonstrate that the CSML approaches outperform our winning method in Music Information Retrieval Evaluation eXchange (MIREX) 2009 in terms of both cost-sensitive and cost-less evaluation metrics. The results on social bookmark prediction also demonstrate that our proposed method has better performance than other methods.
author2 林守德
author_facet 林守德
Hung-Yi Lo
駱宏毅
author Hung-Yi Lo
駱宏毅
spellingShingle Hung-Yi Lo
駱宏毅
Cost-Sensitive Multi-Label Classification with Applications
author_sort Hung-Yi Lo
title Cost-Sensitive Multi-Label Classification with Applications
title_short Cost-Sensitive Multi-Label Classification with Applications
title_full Cost-Sensitive Multi-Label Classification with Applications
title_fullStr Cost-Sensitive Multi-Label Classification with Applications
title_full_unstemmed Cost-Sensitive Multi-Label Classification with Applications
title_sort cost-sensitive multi-label classification with applications
publishDate 2013
url http://ndltd.ncl.edu.tw/handle/61015886145358618517
work_keys_str_mv AT hungyilo costsensitivemultilabelclassificationwithapplications
AT luòhóngyì costsensitivemultilabelclassificationwithapplications
AT hungyilo chéngběndǎoxiàngduōbiāoqiānxuéxíyǎnsuànfǎyǔyīngyòng
AT luòhóngyì chéngběndǎoxiàngduōbiāoqiānxuéxíyǎnsuànfǎyǔyīngyòng
_version_ 1718211117243695104