Optimal Feature Set Size in Random Forest Regression
One of the most important hyper-parameters in the Random Forest (RF) algorithm is the feature set size used to search for the best partitioning rule at each node of trees. Most existing research on feature set size has been done primarily with a focus on classification problems. We studied the effec...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-04-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/11/8/3428 |
id |
doaj-c0159a35ad5841e7a3c8cc10a9b05586 |
---|---|
record_format |
Article |
spelling |
doaj-c0159a35ad5841e7a3c8cc10a9b055862021-04-12T23:00:25ZengMDPI AGApplied Sciences2076-34172021-04-01113428342810.3390/app11083428Optimal Feature Set Size in Random Forest RegressionSunwoo Han0Hyunjoong Kim1Fred Hutchinson Cancer Research Center, Vaccine and Infectious Disease Division, Seattle, WA 98006, USADepartment of Applied Statistics, Yonsei University, Seoul 03722, KoreaOne of the most important hyper-parameters in the Random Forest (RF) algorithm is the feature set size used to search for the best partitioning rule at each node of trees. Most existing research on feature set size has been done primarily with a focus on classification problems. We studied the effect of feature set size in the context of regression. Through experimental studies using many datasets, we first investigated whether the RF regression predictions are affected by the feature set size. Then, we found a rule associated with the optimal size based on the characteristics of each data. Lastly, we developed a search algorithm for estimating the best feature set size in RF regression. We showed that the proposed search algorithm can provide improvements over other choices, such as using the default size specified in the <i>randomForest</i> R package and using the common grid search method.https://www.mdpi.com/2076-3417/11/8/3428random forestfeature set sizegrid searchregression |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Sunwoo Han Hyunjoong Kim |
spellingShingle |
Sunwoo Han Hyunjoong Kim Optimal Feature Set Size in Random Forest Regression Applied Sciences random forest feature set size grid search regression |
author_facet |
Sunwoo Han Hyunjoong Kim |
author_sort |
Sunwoo Han |
title |
Optimal Feature Set Size in Random Forest Regression |
title_short |
Optimal Feature Set Size in Random Forest Regression |
title_full |
Optimal Feature Set Size in Random Forest Regression |
title_fullStr |
Optimal Feature Set Size in Random Forest Regression |
title_full_unstemmed |
Optimal Feature Set Size in Random Forest Regression |
title_sort |
optimal feature set size in random forest regression |
publisher |
MDPI AG |
series |
Applied Sciences |
issn |
2076-3417 |
publishDate |
2021-04-01 |
description |
One of the most important hyper-parameters in the Random Forest (RF) algorithm is the feature set size used to search for the best partitioning rule at each node of trees. Most existing research on feature set size has been done primarily with a focus on classification problems. We studied the effect of feature set size in the context of regression. Through experimental studies using many datasets, we first investigated whether the RF regression predictions are affected by the feature set size. Then, we found a rule associated with the optimal size based on the characteristics of each data. Lastly, we developed a search algorithm for estimating the best feature set size in RF regression. We showed that the proposed search algorithm can provide improvements over other choices, such as using the default size specified in the <i>randomForest</i> R package and using the common grid search method. |
topic |
random forest feature set size grid search regression |
url |
https://www.mdpi.com/2076-3417/11/8/3428 |
work_keys_str_mv |
AT sunwoohan optimalfeaturesetsizeinrandomforestregression AT hyunjoongkim optimalfeaturesetsizeinrandomforestregression |
_version_ |
1721529631817859072 |