Optimal Feature Set Size in Random Forest Regression

One of the most important hyper-parameters in the Random Forest (RF) algorithm is the feature set size used to search for the best partitioning rule at each node of trees. Most existing research on feature set size has been done primarily with a focus on classification problems. We studied the effec...

Full description

Bibliographic Details
Main Authors: Sunwoo Han, Hyunjoong Kim
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/8/3428
id doaj-c0159a35ad5841e7a3c8cc10a9b05586
record_format Article
spelling doaj-c0159a35ad5841e7a3c8cc10a9b055862021-04-12T23:00:25ZengMDPI AGApplied Sciences2076-34172021-04-01113428342810.3390/app11083428Optimal Feature Set Size in Random Forest RegressionSunwoo Han0Hyunjoong Kim1Fred Hutchinson Cancer Research Center, Vaccine and Infectious Disease Division, Seattle, WA 98006, USADepartment of Applied Statistics, Yonsei University, Seoul 03722, KoreaOne of the most important hyper-parameters in the Random Forest (RF) algorithm is the feature set size used to search for the best partitioning rule at each node of trees. Most existing research on feature set size has been done primarily with a focus on classification problems. We studied the effect of feature set size in the context of regression. Through experimental studies using many datasets, we first investigated whether the RF regression predictions are affected by the feature set size. Then, we found a rule associated with the optimal size based on the characteristics of each data. Lastly, we developed a search algorithm for estimating the best feature set size in RF regression. We showed that the proposed search algorithm can provide improvements over other choices, such as using the default size specified in the <i>randomForest</i> R package and using the common grid search method.https://www.mdpi.com/2076-3417/11/8/3428random forestfeature set sizegrid searchregression
collection DOAJ
language English
format Article
sources DOAJ
author Sunwoo Han
Hyunjoong Kim
spellingShingle Sunwoo Han
Hyunjoong Kim
Optimal Feature Set Size in Random Forest Regression
Applied Sciences
random forest
feature set size
grid search
regression
author_facet Sunwoo Han
Hyunjoong Kim
author_sort Sunwoo Han
title Optimal Feature Set Size in Random Forest Regression
title_short Optimal Feature Set Size in Random Forest Regression
title_full Optimal Feature Set Size in Random Forest Regression
title_fullStr Optimal Feature Set Size in Random Forest Regression
title_full_unstemmed Optimal Feature Set Size in Random Forest Regression
title_sort optimal feature set size in random forest regression
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2021-04-01
description One of the most important hyper-parameters in the Random Forest (RF) algorithm is the feature set size used to search for the best partitioning rule at each node of trees. Most existing research on feature set size has been done primarily with a focus on classification problems. We studied the effect of feature set size in the context of regression. Through experimental studies using many datasets, we first investigated whether the RF regression predictions are affected by the feature set size. Then, we found a rule associated with the optimal size based on the characteristics of each data. Lastly, we developed a search algorithm for estimating the best feature set size in RF regression. We showed that the proposed search algorithm can provide improvements over other choices, such as using the default size specified in the <i>randomForest</i> R package and using the common grid search method.
topic random forest
feature set size
grid search
regression
url https://www.mdpi.com/2076-3417/11/8/3428
work_keys_str_mv AT sunwoohan optimalfeaturesetsizeinrandomforestregression
AT hyunjoongkim optimalfeaturesetsizeinrandomforestregression
_version_ 1721529631817859072