Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring
The article’s subject matter deals with the management of software clones. Software clones are duplicate code fragments that can exist in the same or different software files. Software clone detection and management has become a well-established research area. Software clones should be managed to mi...
| Published in: | Радіоелектронні і комп'ютерні системи |
|---|---|
| Main Authors: | , , |
| Format: | Article |
| Language: | English |
| Published: |
National Aerospace University «Kharkiv Aviation Institute»
2025-09-01
|
| Subjects: | |
| Online Access: | https://nti.khai.edu/ojs/index.php/reks/article/view/3145 |
| _version_ | 1848672749245956096 |
|---|---|
| author | Manpreet Kaur Dhavleesh Rattan Madan Lal |
| author_facet | Manpreet Kaur Dhavleesh Rattan Madan Lal |
| author_sort | Manpreet Kaur |
| collection | DOAJ |
| container_title | Радіоелектронні і комп'ютерні системи |
| description | The article’s subject matter deals with the management of software clones. Software clones are duplicate code fragments that can exist in the same or different software files. Software clone detection and management has become a well-established research area. Software clones should be managed to minimize their ill-effects, as the presence of clones can increase the software’s maintenance cost and resource requirements. Refactoring is a commonly used technique for managing clones. A software clone detection tool can detect many clones from the software, but not all detected clones are suitable for refactoring. A developer needs a subset of detected clones that can be easily refactored. This study aims to suggest software clones for refactoring using machine learning techniques. This study evaluates the performance of fourteen machine-learning algorithms and investigates the influence of three feature selection methods on clone recommendation accuracy. The tasks to be solved are as follows: selecting appropriate features from datasets, developing machine learning-based models that can suggest suitable clones for refactoring, and selecting an effective machine learning and feature selection algorithm for recommending clones for refactoring. The methods used for feature selection are correlation, InfoGain, and ReliefF. The study is conducted on datasets from six open-source software written in Java. The experimental results show that the Decision Tree and LogitBoost classifiers achieve the highest accuracy of 94.44 % on the Lucene dataset. ReliefF yields the best performance among the feature selection methods, particularly when used with the Decision Tree algorithm. This study concludes that Random Committee, Random Forest, and Decision Tree perform best when paired with correlation, InfoGain, and ReliefF, respectively. Overall, the Decision Tree classifier, combined with the ReliefF feature selection method, delivers the highest average precision, recall, and F-score across datasets. |
| format | Article |
| id | doaj-art-c0ba75d9e34e4051a620fffc3d2abe4e |
| institution | Directory of Open Access Journals |
| issn | 1814-4225 2663-2012 |
| language | English |
| publishDate | 2025-09-01 |
| publisher | National Aerospace University «Kharkiv Aviation Institute» |
| record_format | Article |
| spelling | doaj-art-c0ba75d9e34e4051a620fffc3d2abe4e2025-10-25T17:01:58ZengNational Aerospace University «Kharkiv Aviation Institute»Радіоелектронні і комп'ютерні системи1814-42252663-20122025-09-0120253536710.32620/reks.2025.3.042692Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoringManpreet Kaur0Dhavleesh Rattan1Madan Lal2Baba Banda Singh Bahadur Engineering College, Fatehgarh Sahib, PunjabPunjabi University, Patiala, PunjabPunjabi University, Patiala, PunjabThe article’s subject matter deals with the management of software clones. Software clones are duplicate code fragments that can exist in the same or different software files. Software clone detection and management has become a well-established research area. Software clones should be managed to minimize their ill-effects, as the presence of clones can increase the software’s maintenance cost and resource requirements. Refactoring is a commonly used technique for managing clones. A software clone detection tool can detect many clones from the software, but not all detected clones are suitable for refactoring. A developer needs a subset of detected clones that can be easily refactored. This study aims to suggest software clones for refactoring using machine learning techniques. This study evaluates the performance of fourteen machine-learning algorithms and investigates the influence of three feature selection methods on clone recommendation accuracy. The tasks to be solved are as follows: selecting appropriate features from datasets, developing machine learning-based models that can suggest suitable clones for refactoring, and selecting an effective machine learning and feature selection algorithm for recommending clones for refactoring. The methods used for feature selection are correlation, InfoGain, and ReliefF. The study is conducted on datasets from six open-source software written in Java. The experimental results show that the Decision Tree and LogitBoost classifiers achieve the highest accuracy of 94.44 % on the Lucene dataset. ReliefF yields the best performance among the feature selection methods, particularly when used with the Decision Tree algorithm. This study concludes that Random Committee, Random Forest, and Decision Tree perform best when paired with correlation, InfoGain, and ReliefF, respectively. Overall, the Decision Tree classifier, combined with the ReliefF feature selection method, delivers the highest average precision, recall, and F-score across datasets.https://nti.khai.edu/ojs/index.php/reks/article/view/3145software clonesclone managementclone recommendation, clone refactoring, feature selection, machine learning |
| spellingShingle | Manpreet Kaur Dhavleesh Rattan Madan Lal Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring software clones clone management clone recommendation, clone refactoring, feature selection, machine learning |
| title | Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring |
| title_full | Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring |
| title_fullStr | Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring |
| title_full_unstemmed | Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring |
| title_short | Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring |
| title_sort | empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring |
| topic | software clones clone management clone recommendation, clone refactoring, feature selection, machine learning |
| url | https://nti.khai.edu/ojs/index.php/reks/article/view/3145 |
| work_keys_str_mv | AT manpreetkaur empiricalevaluationoffeatureselectionandmachinelearningtechniquestorecommendclonesforsoftwarerefactoring AT dhavleeshrattan empiricalevaluationoffeatureselectionandmachinelearningtechniquestorecommendclonesforsoftwarerefactoring AT madanlal empiricalevaluationoffeatureselectionandmachinelearningtechniquestorecommendclonesforsoftwarerefactoring |
