Enabling Repository Recommendation on GitHub by Machine Learning
碩士 === 中原大學 === 資訊工程研究所 === 107 === GitHub is one of the most popular platforms for open source communities in the world. People use its repositories to store their codes or projects. These repositories can be designed for different purposes. If we can use one repository as a query to find the other...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2019
|
Online Access: | http://ndltd.ncl.edu.tw/handle/v75xh9 |
id |
ndltd-TW-107CYCU5392001 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107CYCU53920012019-05-30T03:57:34Z http://ndltd.ncl.edu.tw/handle/v75xh9 Enabling Repository Recommendation on GitHub by Machine Learning 以機器學習實現GitHub倉庫之推薦 Ting-Hsiang Huang 黃鼎翔 碩士 中原大學 資訊工程研究所 107 GitHub is one of the most popular platforms for open source communities in the world. People use its repositories to store their codes or projects. These repositories can be designed for different purposes. If we can use one repository as a query to find the others with the same purpose, it will help knowledge sharing, code reusability and plagiarism detection. However, the keyword-based search engine built in GitHub cannot provide satisfactory results. In our study, in order to find the related repositories that are worthy of recommendation, we divide our method into two levels, filtering out users and searching repositories. The former analyzes GitHub events recorded on GitHub Archive and chooses potential users who own the repositories worthy of recommendation, while the latter analyzes textual data of these repositories to find the ones with the same purpose. From the results of the experiment filtering out users, we find that our model-based approach has the highest proportion of potential users as described above, and find out the most number of potential users in different top-k situations. Secondly, in the comparison of various feature extraction methods, the model fusing user behavior and event weights can achieve better results. In the experiment to search repositories, we compare different textual models to find that 75% accuracy can be achieved. We also observe how the parameter setting in model training influences the performance. Yi-Hung Wu 吳宜鴻 2019 學位論文 ; thesis 71 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 中原大學 === 資訊工程研究所 === 107 === GitHub is one of the most popular platforms for open source communities in the world. People use its repositories to store their codes or projects. These repositories can be designed for different purposes. If we can use one repository as a query to find the others with the same purpose, it will help knowledge sharing, code reusability and plagiarism detection. However, the keyword-based search engine built in GitHub cannot provide satisfactory results. In our study, in order to find the related repositories that are worthy of recommendation, we divide our method into two levels, filtering out users and searching repositories. The former analyzes GitHub events recorded on GitHub Archive and chooses potential users who own the repositories worthy of recommendation, while the latter analyzes textual data of these repositories to find the ones with the same purpose. From the results of the experiment filtering out users, we find that our model-based approach has the highest proportion of potential users as described above, and find out the most number of potential users in different top-k situations. Secondly, in the comparison of various feature extraction methods, the model fusing user behavior and event weights can achieve better results. In the experiment to search repositories, we compare different textual models to find that 75% accuracy can be achieved. We also observe how the parameter setting in model training influences the performance.
|
author2 |
Yi-Hung Wu |
author_facet |
Yi-Hung Wu Ting-Hsiang Huang 黃鼎翔 |
author |
Ting-Hsiang Huang 黃鼎翔 |
spellingShingle |
Ting-Hsiang Huang 黃鼎翔 Enabling Repository Recommendation on GitHub by Machine Learning |
author_sort |
Ting-Hsiang Huang |
title |
Enabling Repository Recommendation on GitHub by Machine Learning |
title_short |
Enabling Repository Recommendation on GitHub by Machine Learning |
title_full |
Enabling Repository Recommendation on GitHub by Machine Learning |
title_fullStr |
Enabling Repository Recommendation on GitHub by Machine Learning |
title_full_unstemmed |
Enabling Repository Recommendation on GitHub by Machine Learning |
title_sort |
enabling repository recommendation on github by machine learning |
publishDate |
2019 |
url |
http://ndltd.ncl.edu.tw/handle/v75xh9 |
work_keys_str_mv |
AT tinghsianghuang enablingrepositoryrecommendationongithubbymachinelearning AT huángdǐngxiáng enablingrepositoryrecommendationongithubbymachinelearning AT tinghsianghuang yǐjīqìxuéxíshíxiàngithubcāngkùzhītuījiàn AT huángdǐngxiáng yǐjīqìxuéxíshíxiàngithubcāngkùzhītuījiàn |
_version_ |
1719196968798912512 |