Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review
The prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedical Linked Annotation Hackathon...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Korea Genome Organization
2020-06-01
|
Series: | Genomics & Informatics |
Subjects: | |
Online Access: | http://genominfo.org/upload/pdf/gi-2020-18-2-e13.pdf |
id |
doaj-b997294662494318b7b485753da6df1f |
---|---|
record_format |
Article |
spelling |
doaj-b997294662494318b7b485753da6df1f2020-11-25T02:45:44ZengKorea Genome OrganizationGenomics & Informatics2234-07422020-06-01182e1310.5808/GI.2020.18.2.e13610Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial reviewHee-Jo Nam0Ryota Yamada1Hyun-Seok Park2 Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, Korea Fuku Corporation, Tokyo 113-0033, Japan Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, KoreaThe prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedical Linked Annotation Hackathon 6 (BLAH6), we experimented with converting, annotating, and updating 301 PMC full-text articles of Genomics & Informatics using PubAnnotation, a system that provides a convenient way to add PMC publications based on PMCID. Thus, this review aims to provide a tutorial overview of practicing the iterative task of named entity recognition with the PubAnnotation/PubDictionaries/TextAE ecosystem. We also describe developing a conversion tool between the Genia tagger output and the JSON format of PubAnnotation during the hackathon.http://genominfo.org/upload/pdf/gi-2020-18-2-e13.pdfnamed entity recognitionnatural language processingtext mining |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Hee-Jo Nam Ryota Yamada Hyun-Seok Park |
spellingShingle |
Hee-Jo Nam Ryota Yamada Hyun-Seok Park Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review Genomics & Informatics named entity recognition natural language processing text mining |
author_facet |
Hee-Jo Nam Ryota Yamada Hyun-Seok Park |
author_sort |
Hee-Jo Nam |
title |
Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review |
title_short |
Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review |
title_full |
Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review |
title_fullStr |
Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review |
title_full_unstemmed |
Using the PubAnnotation ecosystem to perform agile text mining on : a tutorial review |
title_sort |
using the pubannotation ecosystem to perform agile text mining on : a tutorial review |
publisher |
Korea Genome Organization |
series |
Genomics & Informatics |
issn |
2234-0742 |
publishDate |
2020-06-01 |
description |
The prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedical Linked Annotation Hackathon 6 (BLAH6), we experimented with converting, annotating, and updating 301 PMC full-text articles of Genomics & Informatics using PubAnnotation, a system that provides a convenient way to add PMC publications based on PMCID. Thus, this review aims to provide a tutorial overview of practicing the iterative task of named entity recognition with the PubAnnotation/PubDictionaries/TextAE ecosystem. We also describe developing a conversion tool between the Genia tagger output and the JSON format of PubAnnotation during the hackathon. |
topic |
named entity recognition natural language processing text mining |
url |
http://genominfo.org/upload/pdf/gi-2020-18-2-e13.pdf |
work_keys_str_mv |
AT heejonam usingthepubannotationecosystemtoperformagiletextminingonatutorialreview AT ryotayamada usingthepubannotationecosystemtoperformagiletextminingonatutorialreview AT hyunseokpark usingthepubannotationecosystemtoperformagiletextminingonatutorialreview |
_version_ |
1724760637626646528 |