TextRank-based Keyword Extraction Method Integrating Semantic Features

TextRank uses a co-occurrence window instead of PageRank Web hyperlinks to determine the relationships between words.However, the vocabulary graph under the co-occurrence window mechanism is an undirected graph, and in most cases, there is no cognitive directional link between the words in the actua...

詳細記述

書誌詳細
出版年:Jisuanji gongcheng
第一著者: YANG Yanjiao, ZHAO Guotao, YUAN Zhenqiang, HAN Jiachen
フォーマット: 論文
言語:英語
出版事項: Editorial Office of Computer Engineering 2021-10-01
主題:
オンライン・アクセス:https://www.ecice06.com/fileup/1000-3428/PDF/20211010.pdf
その他の書誌記述
要約:TextRank uses a co-occurrence window instead of PageRank Web hyperlinks to determine the relationships between words.However, the vocabulary graph under the co-occurrence window mechanism is an undirected graph, and in most cases, there is no cognitive directional link between the words in the actual Chinese texts and the words in the co-occurrence window.Under this mechanism, the relationship between the words is sharply different from the hyperlink relationship of PageRank.To address the problem, a keyword extraction method, S-TextRank, is proposed integrating semantic features.Based on TextRank, S-TextRank employs dependency relationships instead of co-occurrence windows to determine the relationships between words to simulate directional PageRank hyperlinks.In addition, different part-of-speech words are assigned with corresponding weight coefficients to simulate the importance of different types of Web pages.Finally, a non-keyword list is constructed by using the IDF method and Chinese grammar rules to exclude the influence of irrelevant words on the extraction results.Experimental results show that the accuracy of the S-TextRank method achieves 74% on the test set, 19.4 percentage points higher than that of the TextRank method.
ISSN:1000-3428