Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review

The prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedical Linked Annotation Hackathon...

Full description

Bibliographic Details
Main Authors: Hee-Jo Nam, Ryota Yamada, Hyun-Seok Park
Format: Article
Language:English
Published: Korea Genome Organization 2020-06-01
Series:Genomics & Informatics
Subjects:
Online Access:http://genominfo.org/upload/pdf/gi-2020-18-2-e13.pdf
id doaj-b997294662494318b7b485753da6df1f
record_format Article
spelling doaj-b997294662494318b7b485753da6df1f2020-11-25T02:45:44ZengKorea Genome OrganizationGenomics & Informatics2234-07422020-06-01182e1310.5808/GI.2020.18.2.e13610Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial reviewHee-Jo Nam0Ryota Yamada1Hyun-Seok Park2 Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, Korea Fuku Corporation, Tokyo 113-0033, Japan Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, KoreaThe prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedical Linked Annotation Hackathon 6 (BLAH6), we experimented with converting, annotating, and updating 301 PMC full-text articles of Genomics & Informatics using PubAnnotation, a system that provides a convenient way to add PMC publications based on PMCID. Thus, this review aims to provide a tutorial overview of practicing the iterative task of named entity recognition with the PubAnnotation/PubDictionaries/TextAE ecosystem. We also describe developing a conversion tool between the Genia tagger output and the JSON format of PubAnnotation during the hackathon.http://genominfo.org/upload/pdf/gi-2020-18-2-e13.pdfnamed entity recognitionnatural language processingtext mining
collection DOAJ
language English
format Article
sources DOAJ
author Hee-Jo Nam
Ryota Yamada
Hyun-Seok Park
spellingShingle Hee-Jo Nam
Ryota Yamada
Hyun-Seok Park
Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review
Genomics & Informatics
named entity recognition
natural language processing
text mining
author_facet Hee-Jo Nam
Ryota Yamada
Hyun-Seok Park
author_sort Hee-Jo Nam
title Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review
title_short Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review
title_full Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review
title_fullStr Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review
title_full_unstemmed Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review
title_sort using the pubannotation ecosystem to perform agile text mining on : a tutorial review
publisher Korea Genome Organization
series Genomics & Informatics
issn 2234-0742
publishDate 2020-06-01
description The prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedical Linked Annotation Hackathon 6 (BLAH6), we experimented with converting, annotating, and updating 301 PMC full-text articles of Genomics & Informatics using PubAnnotation, a system that provides a convenient way to add PMC publications based on PMCID. Thus, this review aims to provide a tutorial overview of practicing the iterative task of named entity recognition with the PubAnnotation/PubDictionaries/TextAE ecosystem. We also describe developing a conversion tool between the Genia tagger output and the JSON format of PubAnnotation during the hackathon.
topic named entity recognition
natural language processing
text mining
url http://genominfo.org/upload/pdf/gi-2020-18-2-e13.pdf
work_keys_str_mv AT heejonam usingthepubannotationecosystemtoperformagiletextminingonatutorialreview
AT ryotayamada usingthepubannotationecosystemtoperformagiletextminingonatutorialreview
AT hyunseokpark usingthepubannotationecosystemtoperformagiletextminingonatutorialreview
_version_ 1724760637626646528