Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring

The article’s subject matter deals with the management of software clones. Software clones are duplicate code fragments that can exist in the same or different software files. Software clone detection and management has become a well-established research area. Software clones should be managed to mi...

Full description

Bibliographic Details
Published in:Радіоелектронні і комп'ютерні системи
Main Authors: Manpreet Kaur, Dhavleesh Rattan, Madan Lal
Format: Article
Language:English
Published: National Aerospace University «Kharkiv Aviation Institute» 2025-09-01
Subjects:
Online Access:https://nti.khai.edu/ojs/index.php/reks/article/view/3145
_version_ 1848672749245956096
author Manpreet Kaur
Dhavleesh Rattan
Madan Lal
author_facet Manpreet Kaur
Dhavleesh Rattan
Madan Lal
author_sort Manpreet Kaur
collection DOAJ
container_title Радіоелектронні і комп'ютерні системи
description The article’s subject matter deals with the management of software clones. Software clones are duplicate code fragments that can exist in the same or different software files. Software clone detection and management has become a well-established research area. Software clones should be managed to minimize their ill-effects, as the presence of clones can increase the software’s maintenance cost and resource requirements. Refactoring is a commonly used technique for managing clones. A software clone detection tool can detect many clones from the software, but not all detected clones are suitable for refactoring. A developer needs a subset of detected clones that can be easily refactored. This study aims to suggest software clones for refactoring using machine learning techniques. This study evaluates the performance of fourteen machine-learning algorithms and investigates the influence of three feature selection methods on clone recommendation accuracy. The tasks to be solved are as follows: selecting appropriate features from datasets, developing machine learning-based models that can suggest suitable clones for refactoring, and selecting an effective machine learning and feature selection algorithm for recommending clones for refactoring. The methods used for feature selection are correlation, InfoGain, and ReliefF.  The study is conducted on datasets from six open-source software written in Java. The experimental results show that the Decision Tree and LogitBoost classifiers achieve the highest accuracy of 94.44 % on the Lucene dataset.  ReliefF yields the best performance among the feature selection methods, particularly when used with the Decision Tree algorithm. This study concludes that Random Committee, Random Forest, and Decision Tree perform best when paired with correlation, InfoGain, and ReliefF, respectively. Overall, the Decision Tree classifier, combined with the ReliefF feature selection method, delivers the highest average precision, recall, and F-score across datasets.
format Article
id doaj-art-c0ba75d9e34e4051a620fffc3d2abe4e
institution Directory of Open Access Journals
issn 1814-4225
2663-2012
language English
publishDate 2025-09-01
publisher National Aerospace University «Kharkiv Aviation Institute»
record_format Article
spelling doaj-art-c0ba75d9e34e4051a620fffc3d2abe4e2025-10-25T17:01:58ZengNational Aerospace University «Kharkiv Aviation Institute»Радіоелектронні і комп'ютерні системи1814-42252663-20122025-09-0120253536710.32620/reks.2025.3.042692Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoringManpreet Kaur0Dhavleesh Rattan1Madan Lal2Baba Banda Singh Bahadur Engineering College, Fatehgarh Sahib, PunjabPunjabi University, Patiala, PunjabPunjabi University, Patiala, PunjabThe article’s subject matter deals with the management of software clones. Software clones are duplicate code fragments that can exist in the same or different software files. Software clone detection and management has become a well-established research area. Software clones should be managed to minimize their ill-effects, as the presence of clones can increase the software’s maintenance cost and resource requirements. Refactoring is a commonly used technique for managing clones. A software clone detection tool can detect many clones from the software, but not all detected clones are suitable for refactoring. A developer needs a subset of detected clones that can be easily refactored. This study aims to suggest software clones for refactoring using machine learning techniques. This study evaluates the performance of fourteen machine-learning algorithms and investigates the influence of three feature selection methods on clone recommendation accuracy. The tasks to be solved are as follows: selecting appropriate features from datasets, developing machine learning-based models that can suggest suitable clones for refactoring, and selecting an effective machine learning and feature selection algorithm for recommending clones for refactoring. The methods used for feature selection are correlation, InfoGain, and ReliefF.  The study is conducted on datasets from six open-source software written in Java. The experimental results show that the Decision Tree and LogitBoost classifiers achieve the highest accuracy of 94.44 % on the Lucene dataset.  ReliefF yields the best performance among the feature selection methods, particularly when used with the Decision Tree algorithm. This study concludes that Random Committee, Random Forest, and Decision Tree perform best when paired with correlation, InfoGain, and ReliefF, respectively. Overall, the Decision Tree classifier, combined with the ReliefF feature selection method, delivers the highest average precision, recall, and F-score across datasets.https://nti.khai.edu/ojs/index.php/reks/article/view/3145software clonesclone managementclone recommendation, clone refactoring, feature selection, machine learning
spellingShingle Manpreet Kaur
Dhavleesh Rattan
Madan Lal
Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring
software clones
clone management
clone recommendation, clone refactoring, feature selection, machine learning
title Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring
title_full Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring
title_fullStr Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring
title_full_unstemmed Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring
title_short Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring
title_sort empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring
topic software clones
clone management
clone recommendation, clone refactoring, feature selection, machine learning
url https://nti.khai.edu/ojs/index.php/reks/article/view/3145
work_keys_str_mv AT manpreetkaur empiricalevaluationoffeatureselectionandmachinelearningtechniquestorecommendclonesforsoftwarerefactoring
AT dhavleeshrattan empiricalevaluationoffeatureselectionandmachinelearningtechniquestorecommendclonesforsoftwarerefactoring
AT madanlal empiricalevaluationoffeatureselectionandmachinelearningtechniquestorecommendclonesforsoftwarerefactoring