Combining instance selection and self-training to improve data stream quantification

Abstract In the last years, learning from data streams has attracted the attention of researchers and practitioners due to its large number of applications. These applications have motivated the research community to propose a significant amount of methods to solve problems in diverse tasks, more pr...

Full description

Bibliographic Details
Main Authors: André G. Maletzke, Denis M. dos Reis, Gustavo E. A. P. A. Batista
Format: Article
Language:English
Published: SpringerOpen 2018-10-01
Series:Journal of the Brazilian Computer Society
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13173-018-0076-0
id doaj-86924abaa2d5434d801744f39e62c93a
record_format Article
spelling doaj-86924abaa2d5434d801744f39e62c93a2021-03-02T10:41:41ZengSpringerOpenJournal of the Brazilian Computer Society0104-65001678-48042018-10-0124111710.1186/s13173-018-0076-0Combining instance selection and self-training to improve data stream quantificationAndré G. Maletzke0Denis M. dos Reis1Gustavo E. A. P. A. Batista2Laboratório de Inteligência Computacional (LABIC), Instituto de Ciências Matemáticas e de Computação (ICMC), Universidade de São PauloLaboratório de Inteligência Computacional (LABIC), Instituto de Ciências Matemáticas e de Computação (ICMC), Universidade de São PauloLaboratório de Inteligência Computacional (LABIC), Instituto de Ciências Matemáticas e de Computação (ICMC), Universidade de São PauloAbstract In the last years, learning from data streams has attracted the attention of researchers and practitioners due to its large number of applications. These applications have motivated the research community to propose a significant amount of methods to solve problems in diverse tasks, more prominently in classification, clustering, and anomaly detection. However, a relevant task known as quantification has remained mostly unexplored. The quantification goal is to provide an estimate of the class prevalence in an unlabeled set. Recently, we proposed the SQSI algorithm to quantify data streams with concept drifts. SQSI uses a statistical test to identify concept drifts and retrain the classifiers. However, the retraining involves requiring the labels for all newly arrived instances. In this paper, we extend SQSI algorithm by exploring instance selection techniques allied to semi-supervised learning. The idea is to request the classes of a smaller subset of recent examples. Our experiments demonstrate that SQSI’s extension significantly reduces the dependency on actual labels while maintaining or improving the quantification accuracy.http://link.springer.com/article/10.1186/s13173-018-0076-0Data streamQuantificationConcept drift
collection DOAJ
language English
format Article
sources DOAJ
author André G. Maletzke
Denis M. dos Reis
Gustavo E. A. P. A. Batista
spellingShingle André G. Maletzke
Denis M. dos Reis
Gustavo E. A. P. A. Batista
Combining instance selection and self-training to improve data stream quantification
Journal of the Brazilian Computer Society
Data stream
Quantification
Concept drift
author_facet André G. Maletzke
Denis M. dos Reis
Gustavo E. A. P. A. Batista
author_sort André G. Maletzke
title Combining instance selection and self-training to improve data stream quantification
title_short Combining instance selection and self-training to improve data stream quantification
title_full Combining instance selection and self-training to improve data stream quantification
title_fullStr Combining instance selection and self-training to improve data stream quantification
title_full_unstemmed Combining instance selection and self-training to improve data stream quantification
title_sort combining instance selection and self-training to improve data stream quantification
publisher SpringerOpen
series Journal of the Brazilian Computer Society
issn 0104-6500
1678-4804
publishDate 2018-10-01
description Abstract In the last years, learning from data streams has attracted the attention of researchers and practitioners due to its large number of applications. These applications have motivated the research community to propose a significant amount of methods to solve problems in diverse tasks, more prominently in classification, clustering, and anomaly detection. However, a relevant task known as quantification has remained mostly unexplored. The quantification goal is to provide an estimate of the class prevalence in an unlabeled set. Recently, we proposed the SQSI algorithm to quantify data streams with concept drifts. SQSI uses a statistical test to identify concept drifts and retrain the classifiers. However, the retraining involves requiring the labels for all newly arrived instances. In this paper, we extend SQSI algorithm by exploring instance selection techniques allied to semi-supervised learning. The idea is to request the classes of a smaller subset of recent examples. Our experiments demonstrate that SQSI’s extension significantly reduces the dependency on actual labels while maintaining or improving the quantification accuracy.
topic Data stream
Quantification
Concept drift
url http://link.springer.com/article/10.1186/s13173-018-0076-0
work_keys_str_mv AT andregmaletzke combininginstanceselectionandselftrainingtoimprovedatastreamquantification
AT denismdosreis combininginstanceselectionandselftrainingtoimprovedatastreamquantification
AT gustavoeapabatista combininginstanceselectionandselftrainingtoimprovedatastreamquantification
_version_ 1724236373498527744