Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review

The prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedical Linked Annotation Hackathon...

Full description

Bibliographic Details
Main Authors:	Hee-Jo Nam, Ryota Yamada, Hyun-Seok Park
Format:	Article
Language:	English
Published:	Korea Genome Organization 2020-06-01
Series:	Genomics & Informatics
Subjects:	named entity recognition natural language processing text mining
Online Access:	http://genominfo.org/upload/pdf/gi-2020-18-2-e13.pdf

Description
Summary:	The prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedical Linked Annotation Hackathon 6 (BLAH6), we experimented with converting, annotating, and updating 301 PMC full-text articles of Genomics & Informatics using PubAnnotation, a system that provides a convenient way to add PMC publications based on PMCID. Thus, this review aims to provide a tutorial overview of practicing the iterative task of named entity recognition with the PubAnnotation/PubDictionaries/TextAE ecosystem. We also describe developing a conversion tool between the Genia tagger output and the JSON format of PubAnnotation during the hackathon.
ISSN:	2234-0742

Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review

Similar Items