Plagiarism Detection in Students' Theses Using The Cosine Similarity Method

The main requirement for graduation from students is to make a final scientific paper. One of the factors determining the quality of a student's scientific work is the uniqueness and innovation of the work. This research aims to apply data mining methods to detect similarities in titles, abstra...

Full description

Bibliographic Details
Main Authors: Oppi Anda Resta, Addin Aditya, Febry Eka Purwiantono
Format: Article
Language:English
Published: Politeknik Ganesha Medan 2021-05-01
Series:Sinkron
Subjects:
Online Access:https://jurnal.polgan.ac.id/index.php/sinkron/article/view/10909
Description
Summary:The main requirement for graduation from students is to make a final scientific paper. One of the factors determining the quality of a student's scientific work is the uniqueness and innovation of the work. This research aims to apply data mining methods to detect similarities in titles, abstracts, or topics of students' final scientific papers so that plagiarism does not occur. In this research, the cosine similarity method is combined with the preprocessing method and TF-IDF to calculate the level of similarity between the title and the abstract of a student's final scientific paper, then the results will be displayed and compared with the existing final project repository based on the threshold value to make a decision whether scientific work can be accepted or rejected. Based on the test data and training data that has been applied to the TF-IDF method, it shows that the percentage level of similarity between the training data document and the test data document is 8%. This shows that the student thesis is still classified as unique and does not contain plagiarism content. The findings of this study can help the university in managing the administration of student theses so that plagiarism does not occur. Furthermore, it is necessary to study further adding methods to increase the accuracy of system performance so that when the process is run the system will work faster and optimally.
ISSN:2541-044X
2541-2019