Mapping Bug Reports to Relevant Source Code Files Based on the Vector Space Model and Word Embedding

Although software bug localization in software maintenance and evolution is cumbersome and time-consuming, it is also very important, especially for large-scale software projects. To lighten the workload of developers, researchers have developed various information retrieval (IR)-based bug localizat...

Full description

Bibliographic Details
Main Authors: Guangliang Liu, Yang Lu, Ke Shi, Jingfei Chang, Xing Wei
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8736209/
Description
Summary:Although software bug localization in software maintenance and evolution is cumbersome and time-consuming, it is also very important, especially for large-scale software projects. To lighten the workload of developers, researchers have developed various information retrieval (IR)-based bug localization models for automated software support. In this paper, we propose a new method that reduces the time required for bug localization. First, the surface lexical similarity between a bug report and source code file is calculated based on the vector space model. Second, to address the lexical gap between the programming language and natural language, the word vector is used to calculate the semantic similarity between the bug report and source code file. Then, we use surface lexical and semantic similarity to calculate the total similarity for detecting buggy source code files. Our experimental word vectors are derived from Skip-gram and GloVe model training. We select an optimal 100 dimensional word vector for bug localization by evaluating it on four open source software examples. Finally, our experimental results show that our method outperforms classical IR-based methods in locating relevant source code files based on several indicators.
ISSN:2169-3536