Experimental Study on 164 Algorithms Available in Software Tools for Solving Standard Non-Linear Regression Problems

In the specialized literature, researchers can find a large number of proposals for solving regression problems that come from different research areas. However, researchers tend to use only proposals from the area in which they are experts. This paper analyses the performance of a large number of t...

Full description

Bibliographic Details
Main Authors: Maria Jose Gacto, Jose Manuel Soto-Hidalgo, Jesus Alcala-Fdez, Rafael Alcala
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8788533/
id doaj-129b8f5fc92448c085187731466c235d
record_format Article
spelling doaj-129b8f5fc92448c085187731466c235d2021-04-05T17:07:26ZengIEEEIEEE Access2169-35362019-01-01710891610893910.1109/ACCESS.2019.29332618788533Experimental Study on 164 Algorithms Available in Software Tools for Solving Standard Non-Linear Regression ProblemsMaria Jose Gacto0Jose Manuel Soto-Hidalgo1https://orcid.org/0000-0003-4412-5449Jesus Alcala-Fdez2Rafael Alcala3Department of Computer Science, University of Jaén, Jaén, SpainDepartment of Electronics and Computer Engineering, University of Córdoba, Córdoba, SpainDepartment of Computer Science and Artificial Intelligence, University of Granada, Granada, SpainDepartment of Computer Science and Artificial Intelligence, University of Granada, Granada, SpainIn the specialized literature, researchers can find a large number of proposals for solving regression problems that come from different research areas. However, researchers tend to use only proposals from the area in which they are experts. This paper analyses the performance of a large number of the available regression algorithms from some of the most known and widely used software tools in order to help non-expert users from other areas to properly solve their own regression problems and to help specialized researchers developing well-founded future proposals by properly comparing and identifying algorithms that will enable them to focus on significant further developments. To sum up, we have analyzed 164 algorithms that come from 14 main different families available in 6 software tools (Neural Networks, Support Vector Machines, Regression Trees, Rule-Based Methods, Stacking, Random Forests, Model trees, Generalized Linear Models, Nearest Neighbor methods, Partial Least Squares and Principal Component Regression, Multivariate Adaptive Regression Splines, Bagging, Boosting, and other methods) over 52 datasets. A new measure has also been proposed to show the goodness of each algorithm with respect to the others. Finally, a statistical analysis by non-parametric tests has been carried out over all the algorithms and on the best 30 algorithms, both with and without bagging. Results show that the algorithms from Random Forest, Model Tree and Support Vector Machine families get the best positions in the rankings obtained by the statistical tests when bagging is not considered. In addition, the use of bagging techniques significantly improves the performance of the algorithms without excessive increase in computational times.https://ieeexplore.ieee.org/document/8788533/Data miningsupervised learningregression algorithmsexperimental study
collection DOAJ
language English
format Article
sources DOAJ
author Maria Jose Gacto
Jose Manuel Soto-Hidalgo
Jesus Alcala-Fdez
Rafael Alcala
spellingShingle Maria Jose Gacto
Jose Manuel Soto-Hidalgo
Jesus Alcala-Fdez
Rafael Alcala
Experimental Study on 164 Algorithms Available in Software Tools for Solving Standard Non-Linear Regression Problems
IEEE Access
Data mining
supervised learning
regression algorithms
experimental study
author_facet Maria Jose Gacto
Jose Manuel Soto-Hidalgo
Jesus Alcala-Fdez
Rafael Alcala
author_sort Maria Jose Gacto
title Experimental Study on 164 Algorithms Available in Software Tools for Solving Standard Non-Linear Regression Problems
title_short Experimental Study on 164 Algorithms Available in Software Tools for Solving Standard Non-Linear Regression Problems
title_full Experimental Study on 164 Algorithms Available in Software Tools for Solving Standard Non-Linear Regression Problems
title_fullStr Experimental Study on 164 Algorithms Available in Software Tools for Solving Standard Non-Linear Regression Problems
title_full_unstemmed Experimental Study on 164 Algorithms Available in Software Tools for Solving Standard Non-Linear Regression Problems
title_sort experimental study on 164 algorithms available in software tools for solving standard non-linear regression problems
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description In the specialized literature, researchers can find a large number of proposals for solving regression problems that come from different research areas. However, researchers tend to use only proposals from the area in which they are experts. This paper analyses the performance of a large number of the available regression algorithms from some of the most known and widely used software tools in order to help non-expert users from other areas to properly solve their own regression problems and to help specialized researchers developing well-founded future proposals by properly comparing and identifying algorithms that will enable them to focus on significant further developments. To sum up, we have analyzed 164 algorithms that come from 14 main different families available in 6 software tools (Neural Networks, Support Vector Machines, Regression Trees, Rule-Based Methods, Stacking, Random Forests, Model trees, Generalized Linear Models, Nearest Neighbor methods, Partial Least Squares and Principal Component Regression, Multivariate Adaptive Regression Splines, Bagging, Boosting, and other methods) over 52 datasets. A new measure has also been proposed to show the goodness of each algorithm with respect to the others. Finally, a statistical analysis by non-parametric tests has been carried out over all the algorithms and on the best 30 algorithms, both with and without bagging. Results show that the algorithms from Random Forest, Model Tree and Support Vector Machine families get the best positions in the rankings obtained by the statistical tests when bagging is not considered. In addition, the use of bagging techniques significantly improves the performance of the algorithms without excessive increase in computational times.
topic Data mining
supervised learning
regression algorithms
experimental study
url https://ieeexplore.ieee.org/document/8788533/
work_keys_str_mv AT mariajosegacto experimentalstudyon164algorithmsavailableinsoftwaretoolsforsolvingstandardnonlinearregressionproblems
AT josemanuelsotohidalgo experimentalstudyon164algorithmsavailableinsoftwaretoolsforsolvingstandardnonlinearregressionproblems
AT jesusalcalafdez experimentalstudyon164algorithmsavailableinsoftwaretoolsforsolvingstandardnonlinearregressionproblems
AT rafaelalcala experimentalstudyon164algorithmsavailableinsoftwaretoolsforsolvingstandardnonlinearregressionproblems
_version_ 1721540213815115776