Enabling Repository Recommendation on GitHub by Machine Learning

碩士 === 中原大學 === 資訊工程研究所 === 107 === GitHub is one of the most popular platforms for open source communities in the world. People use its repositories to store their codes or projects. These repositories can be designed for different purposes. If we can use one repository as a query to find the other...

Full description

Bibliographic Details
Main Authors: Ting-Hsiang Huang, 黃鼎翔
Other Authors: Yi-Hung Wu
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/v75xh9
id ndltd-TW-107CYCU5392001
record_format oai_dc
spelling ndltd-TW-107CYCU53920012019-05-30T03:57:34Z http://ndltd.ncl.edu.tw/handle/v75xh9 Enabling Repository Recommendation on GitHub by Machine Learning 以機器學習實現GitHub倉庫之推薦 Ting-Hsiang Huang 黃鼎翔 碩士 中原大學 資訊工程研究所 107 GitHub is one of the most popular platforms for open source communities in the world. People use its repositories to store their codes or projects. These repositories can be designed for different purposes. If we can use one repository as a query to find the others with the same purpose, it will help knowledge sharing, code reusability and plagiarism detection. However, the keyword-based search engine built in GitHub cannot provide satisfactory results. In our study, in order to find the related repositories that are worthy of recommendation, we divide our method into two levels, filtering out users and searching repositories. The former analyzes GitHub events recorded on GitHub Archive and chooses potential users who own the repositories worthy of recommendation, while the latter analyzes textual data of these repositories to find the ones with the same purpose. From the results of the experiment filtering out users, we find that our model-based approach has the highest proportion of potential users as described above, and find out the most number of potential users in different top-k situations. Secondly, in the comparison of various feature extraction methods, the model fusing user behavior and event weights can achieve better results. In the experiment to search repositories, we compare different textual models to find that 75% accuracy can be achieved. We also observe how the parameter setting in model training influences the performance. Yi-Hung Wu 吳宜鴻 2019 學位論文 ; thesis 71 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 中原大學 === 資訊工程研究所 === 107 === GitHub is one of the most popular platforms for open source communities in the world. People use its repositories to store their codes or projects. These repositories can be designed for different purposes. If we can use one repository as a query to find the others with the same purpose, it will help knowledge sharing, code reusability and plagiarism detection. However, the keyword-based search engine built in GitHub cannot provide satisfactory results. In our study, in order to find the related repositories that are worthy of recommendation, we divide our method into two levels, filtering out users and searching repositories. The former analyzes GitHub events recorded on GitHub Archive and chooses potential users who own the repositories worthy of recommendation, while the latter analyzes textual data of these repositories to find the ones with the same purpose. From the results of the experiment filtering out users, we find that our model-based approach has the highest proportion of potential users as described above, and find out the most number of potential users in different top-k situations. Secondly, in the comparison of various feature extraction methods, the model fusing user behavior and event weights can achieve better results. In the experiment to search repositories, we compare different textual models to find that 75% accuracy can be achieved. We also observe how the parameter setting in model training influences the performance.
author2 Yi-Hung Wu
author_facet Yi-Hung Wu
Ting-Hsiang Huang
黃鼎翔
author Ting-Hsiang Huang
黃鼎翔
spellingShingle Ting-Hsiang Huang
黃鼎翔
Enabling Repository Recommendation on GitHub by Machine Learning
author_sort Ting-Hsiang Huang
title Enabling Repository Recommendation on GitHub by Machine Learning
title_short Enabling Repository Recommendation on GitHub by Machine Learning
title_full Enabling Repository Recommendation on GitHub by Machine Learning
title_fullStr Enabling Repository Recommendation on GitHub by Machine Learning
title_full_unstemmed Enabling Repository Recommendation on GitHub by Machine Learning
title_sort enabling repository recommendation on github by machine learning
publishDate 2019
url http://ndltd.ncl.edu.tw/handle/v75xh9
work_keys_str_mv AT tinghsianghuang enablingrepositoryrecommendationongithubbymachinelearning
AT huángdǐngxiáng enablingrepositoryrecommendationongithubbymachinelearning
AT tinghsianghuang yǐjīqìxuéxíshíxiàngithubcāngkùzhītuījiàn
AT huángdǐngxiáng yǐjīqìxuéxíshíxiàngithubcāngkùzhītuījiàn
_version_ 1719196968798912512