Research on Information Extraction of Technical Documents and Construction of Domain Knowledge Graph

With the rapid development of knowledge graph related technologies, domain knowledge graph has become a research hotspot in academia and industry. However, the domain knowledge graph for technical documents is not mature enough, and the semantic information implicit in unstructured technical documen...

Full description

Bibliographic Details
Main Authors:	Huaxuan Zhao, Yueling Pan, Feng Yang
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Domain knowledge graph information extraction graph database TextCNN Neo4j resource retrieval
Online Access:	https://ieeexplore.ieee.org/document/9195862/

id	doaj-ac3d7c1ead5c45aaa972fac62335f524
record_format	Article
spelling	doaj-ac3d7c1ead5c45aaa972fac62335f5242021-03-30T03:48:29ZengIEEEIEEE Access2169-35362020-01-01816808716809810.1109/ACCESS.2020.30240709195862Research on Information Extraction of Technical Documents and Construction of Domain Knowledge GraphHuaxuan Zhao0https://orcid.org/0000-0002-9962-597XYueling Pan1Feng Yang2https://orcid.org/0000-0002-4854-6331School of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaWith the rapid development of knowledge graph related technologies, domain knowledge graph has become a research hotspot in academia and industry. However, the domain knowledge graph for technical documents is not mature enough, and the semantic information implicit in unstructured technical documents has not been fully tapped. Combining the characteristics of technical documents, the paper proposes a TextCNN-based topic information extraction model and constructs a domain knowledge graph for technical documents. It uses the graph database Neo4j for knowledge storage and visualization. The information extraction model based on TextCNN can automatically extract the subject information of the document and the summary information such as title, ID, status, meeting, organization, etc. Experiments show that the model has high accuracy on the technical document dataset, which can effectively reduce the cost of manual annotation and data collation. At the same time, knowledge graph visualization can facilitate scientific researchers to search, track and update technical documents, which can show the evolution of technology more clearly.https://ieeexplore.ieee.org/document/9195862/Domain knowledge graphinformation extractiongraph databaseTextCNNNeo4jresource retrieval
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Huaxuan Zhao Yueling Pan Feng Yang
spellingShingle	Huaxuan Zhao Yueling Pan Feng Yang Research on Information Extraction of Technical Documents and Construction of Domain Knowledge Graph IEEE Access Domain knowledge graph information extraction graph database TextCNN Neo4j resource retrieval
author_facet	Huaxuan Zhao Yueling Pan Feng Yang
author_sort	Huaxuan Zhao
title	Research on Information Extraction of Technical Documents and Construction of Domain Knowledge Graph
title_short	Research on Information Extraction of Technical Documents and Construction of Domain Knowledge Graph
title_full	Research on Information Extraction of Technical Documents and Construction of Domain Knowledge Graph
title_fullStr	Research on Information Extraction of Technical Documents and Construction of Domain Knowledge Graph
title_full_unstemmed	Research on Information Extraction of Technical Documents and Construction of Domain Knowledge Graph
title_sort	research on information extraction of technical documents and construction of domain knowledge graph
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	With the rapid development of knowledge graph related technologies, domain knowledge graph has become a research hotspot in academia and industry. However, the domain knowledge graph for technical documents is not mature enough, and the semantic information implicit in unstructured technical documents has not been fully tapped. Combining the characteristics of technical documents, the paper proposes a TextCNN-based topic information extraction model and constructs a domain knowledge graph for technical documents. It uses the graph database Neo4j for knowledge storage and visualization. The information extraction model based on TextCNN can automatically extract the subject information of the document and the summary information such as title, ID, status, meeting, organization, etc. Experiments show that the model has high accuracy on the technical document dataset, which can effectively reduce the cost of manual annotation and data collation. At the same time, knowledge graph visualization can facilitate scientific researchers to search, track and update technical documents, which can show the evolution of technology more clearly.
topic	Domain knowledge graph information extraction graph database TextCNN Neo4j resource retrieval
url	https://ieeexplore.ieee.org/document/9195862/
work_keys_str_mv	AT huaxuanzhao researchoninformationextractionoftechnicaldocumentsandconstructionofdomainknowledgegraph AT yuelingpan researchoninformationextractionoftechnicaldocumentsandconstructionofdomainknowledgegraph AT fengyang researchoninformationextractionoftechnicaldocumentsandconstructionofdomainknowledgegraph
_version_	1724182875767570432

Research on Information Extraction of Technical Documents and Construction of Domain Knowledge Graph

Similar Items