Reduced Data Sets and Entropy-Based Discretization
Results of experiments on numerical data sets discretized using two methods—global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each di...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-10-01
|
Series: | Entropy |
Subjects: | |
Online Access: | https://www.mdpi.com/1099-4300/21/11/1051 |
id |
doaj-29dad1f9e8d24111b53762c30f10b3db |
---|---|
record_format |
Article |
spelling |
doaj-29dad1f9e8d24111b53762c30f10b3db2020-11-25T01:35:03ZengMDPI AGEntropy1099-43002019-10-012111105110.3390/e21111051e21111051Reduced Data Sets and Entropy-Based DiscretizationJerzy W. Grzymala-Busse0Zdzislaw S. Hippe1Teresa Mroczek2Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USADepartment of Artificial Intelligence, University of Information Technology and Management, 35–225 Rzeszow, PolandDepartment of Artificial Intelligence, University of Information Technology and Management, 35–225 Rzeszow, PolandResults of experiments on numerical data sets discretized using two methods—global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each discretized data set and two data sets, based, respectively, on left and right reducts, we applied ten-fold cross validation using the C4.5 decision tree generation system. Our main objective was to compare the quality of all three types of data sets in terms of an error rate. Additionally, we compared complexity of generated decision trees. We show that reduction of data sets may only increase the error rate and that the decision trees generated from reduced decision sets are not simpler than the decision trees generated from non-reduced data sets.https://www.mdpi.com/1099-4300/21/11/1051data miningnumerical attributesdiscretizationentropy |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jerzy W. Grzymala-Busse Zdzislaw S. Hippe Teresa Mroczek |
spellingShingle |
Jerzy W. Grzymala-Busse Zdzislaw S. Hippe Teresa Mroczek Reduced Data Sets and Entropy-Based Discretization Entropy data mining numerical attributes discretization entropy |
author_facet |
Jerzy W. Grzymala-Busse Zdzislaw S. Hippe Teresa Mroczek |
author_sort |
Jerzy W. Grzymala-Busse |
title |
Reduced Data Sets and Entropy-Based Discretization |
title_short |
Reduced Data Sets and Entropy-Based Discretization |
title_full |
Reduced Data Sets and Entropy-Based Discretization |
title_fullStr |
Reduced Data Sets and Entropy-Based Discretization |
title_full_unstemmed |
Reduced Data Sets and Entropy-Based Discretization |
title_sort |
reduced data sets and entropy-based discretization |
publisher |
MDPI AG |
series |
Entropy |
issn |
1099-4300 |
publishDate |
2019-10-01 |
description |
Results of experiments on numerical data sets discretized using two methods—global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each discretized data set and two data sets, based, respectively, on left and right reducts, we applied ten-fold cross validation using the C4.5 decision tree generation system. Our main objective was to compare the quality of all three types of data sets in terms of an error rate. Additionally, we compared complexity of generated decision trees. We show that reduction of data sets may only increase the error rate and that the decision trees generated from reduced decision sets are not simpler than the decision trees generated from non-reduced data sets. |
topic |
data mining numerical attributes discretization entropy |
url |
https://www.mdpi.com/1099-4300/21/11/1051 |
work_keys_str_mv |
AT jerzywgrzymalabusse reduceddatasetsandentropybaseddiscretization AT zdzislawshippe reduceddatasetsandentropybaseddiscretization AT teresamroczek reduceddatasetsandentropybaseddiscretization |
_version_ |
1725068904705818624 |