Reduced Data Sets and Entropy-Based Discretization

Results of experiments on numerical data sets discretized using two methods—global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each di...

Full description

Bibliographic Details
Main Authors: Jerzy W. Grzymala-Busse, Zdzislaw S. Hippe, Teresa Mroczek
Format: Article
Language:English
Published: MDPI AG 2019-10-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/21/11/1051
id doaj-29dad1f9e8d24111b53762c30f10b3db
record_format Article
spelling doaj-29dad1f9e8d24111b53762c30f10b3db2020-11-25T01:35:03ZengMDPI AGEntropy1099-43002019-10-012111105110.3390/e21111051e21111051Reduced Data Sets and Entropy-Based DiscretizationJerzy W. Grzymala-Busse0Zdzislaw S. Hippe1Teresa Mroczek2Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USADepartment of Artificial Intelligence, University of Information Technology and Management, 35–225 Rzeszow, PolandDepartment of Artificial Intelligence, University of Information Technology and Management, 35–225 Rzeszow, PolandResults of experiments on numerical data sets discretized using two methods—global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each discretized data set and two data sets, based, respectively, on left and right reducts, we applied ten-fold cross validation using the C4.5 decision tree generation system. Our main objective was to compare the quality of all three types of data sets in terms of an error rate. Additionally, we compared complexity of generated decision trees. We show that reduction of data sets may only increase the error rate and that the decision trees generated from reduced decision sets are not simpler than the decision trees generated from non-reduced data sets.https://www.mdpi.com/1099-4300/21/11/1051data miningnumerical attributesdiscretizationentropy
collection DOAJ
language English
format Article
sources DOAJ
author Jerzy W. Grzymala-Busse
Zdzislaw S. Hippe
Teresa Mroczek
spellingShingle Jerzy W. Grzymala-Busse
Zdzislaw S. Hippe
Teresa Mroczek
Reduced Data Sets and Entropy-Based Discretization
Entropy
data mining
numerical attributes
discretization
entropy
author_facet Jerzy W. Grzymala-Busse
Zdzislaw S. Hippe
Teresa Mroczek
author_sort Jerzy W. Grzymala-Busse
title Reduced Data Sets and Entropy-Based Discretization
title_short Reduced Data Sets and Entropy-Based Discretization
title_full Reduced Data Sets and Entropy-Based Discretization
title_fullStr Reduced Data Sets and Entropy-Based Discretization
title_full_unstemmed Reduced Data Sets and Entropy-Based Discretization
title_sort reduced data sets and entropy-based discretization
publisher MDPI AG
series Entropy
issn 1099-4300
publishDate 2019-10-01
description Results of experiments on numerical data sets discretized using two methods—global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each discretized data set and two data sets, based, respectively, on left and right reducts, we applied ten-fold cross validation using the C4.5 decision tree generation system. Our main objective was to compare the quality of all three types of data sets in terms of an error rate. Additionally, we compared complexity of generated decision trees. We show that reduction of data sets may only increase the error rate and that the decision trees generated from reduced decision sets are not simpler than the decision trees generated from non-reduced data sets.
topic data mining
numerical attributes
discretization
entropy
url https://www.mdpi.com/1099-4300/21/11/1051
work_keys_str_mv AT jerzywgrzymalabusse reduceddatasetsandentropybaseddiscretization
AT zdzislawshippe reduceddatasetsandentropybaseddiscretization
AT teresamroczek reduceddatasetsandentropybaseddiscretization
_version_ 1725068904705818624