Combining instance selection and self-training to improve data stream quantification

Abstract In the last years, learning from data streams has attracted the attention of researchers and practitioners due to its large number of applications. These applications have motivated the research community to propose a significant amount of methods to solve problems in diverse tasks, more pr...

Full description

Bibliographic Details
Main Authors:	André G. Maletzke, Denis M. dos Reis, Gustavo E. A. P. A. Batista
Format:	Article
Language:	English
Published:	SpringerOpen 2018-10-01
Series:	Journal of the Brazilian Computer Society
Subjects:	Data stream Quantification Concept drift
Online Access:	http://link.springer.com/article/10.1186/s13173-018-0076-0

id	doaj-86924abaa2d5434d801744f39e62c93a
record_format	Article
spelling	doaj-86924abaa2d5434d801744f39e62c93a2021-03-02T10:41:41ZengSpringerOpenJournal of the Brazilian Computer Society0104-65001678-48042018-10-0124111710.1186/s13173-018-0076-0Combining instance selection and self-training to improve data stream quantificationAndré G. Maletzke0Denis M. dos Reis1Gustavo E. A. P. A. Batista2Laboratório de Inteligência Computacional (LABIC), Instituto de Ciências Matemáticas e de Computação (ICMC), Universidade de São PauloLaboratório de Inteligência Computacional (LABIC), Instituto de Ciências Matemáticas e de Computação (ICMC), Universidade de São PauloLaboratório de Inteligência Computacional (LABIC), Instituto de Ciências Matemáticas e de Computação (ICMC), Universidade de São PauloAbstract In the last years, learning from data streams has attracted the attention of researchers and practitioners due to its large number of applications. These applications have motivated the research community to propose a significant amount of methods to solve problems in diverse tasks, more prominently in classification, clustering, and anomaly detection. However, a relevant task known as quantification has remained mostly unexplored. The quantification goal is to provide an estimate of the class prevalence in an unlabeled set. Recently, we proposed the SQSI algorithm to quantify data streams with concept drifts. SQSI uses a statistical test to identify concept drifts and retrain the classifiers. However, the retraining involves requiring the labels for all newly arrived instances. In this paper, we extend SQSI algorithm by exploring instance selection techniques allied to semi-supervised learning. The idea is to request the classes of a smaller subset of recent examples. Our experiments demonstrate that SQSI’s extension significantly reduces the dependency on actual labels while maintaining or improving the quantification accuracy.http://link.springer.com/article/10.1186/s13173-018-0076-0Data streamQuantificationConcept drift
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	André G. Maletzke Denis M. dos Reis Gustavo E. A. P. A. Batista
spellingShingle	André G. Maletzke Denis M. dos Reis Gustavo E. A. P. A. Batista Combining instance selection and self-training to improve data stream quantification Journal of the Brazilian Computer Society Data stream Quantification Concept drift
author_facet	André G. Maletzke Denis M. dos Reis Gustavo E. A. P. A. Batista
author_sort	André G. Maletzke
title	Combining instance selection and self-training to improve data stream quantification
title_short	Combining instance selection and self-training to improve data stream quantification
title_full	Combining instance selection and self-training to improve data stream quantification
title_fullStr	Combining instance selection and self-training to improve data stream quantification
title_full_unstemmed	Combining instance selection and self-training to improve data stream quantification
title_sort	combining instance selection and self-training to improve data stream quantification
publisher	SpringerOpen
series	Journal of the Brazilian Computer Society
issn	0104-6500 1678-4804
publishDate	2018-10-01
description	Abstract In the last years, learning from data streams has attracted the attention of researchers and practitioners due to its large number of applications. These applications have motivated the research community to propose a significant amount of methods to solve problems in diverse tasks, more prominently in classification, clustering, and anomaly detection. However, a relevant task known as quantification has remained mostly unexplored. The quantification goal is to provide an estimate of the class prevalence in an unlabeled set. Recently, we proposed the SQSI algorithm to quantify data streams with concept drifts. SQSI uses a statistical test to identify concept drifts and retrain the classifiers. However, the retraining involves requiring the labels for all newly arrived instances. In this paper, we extend SQSI algorithm by exploring instance selection techniques allied to semi-supervised learning. The idea is to request the classes of a smaller subset of recent examples. Our experiments demonstrate that SQSI’s extension significantly reduces the dependency on actual labels while maintaining or improving the quantification accuracy.
topic	Data stream Quantification Concept drift
url	http://link.springer.com/article/10.1186/s13173-018-0076-0
work_keys_str_mv	AT andregmaletzke combininginstanceselectionandselftrainingtoimprovedatastreamquantification AT denismdosreis combininginstanceselectionandselftrainingtoimprovedatastreamquantification AT gustavoeapabatista combininginstanceselectionandselftrainingtoimprovedatastreamquantification
_version_	1724236373498527744

Combining instance selection and self-training to improve data stream quantification

Similar Items