Filtered BERT: Similarity Filter-Based Augmentation with Bidirectional Transfer Learning for Protected Health Information Prediction in Clinical Documents
For the secondary use of clinical documents, it is necessary to de-identify protected health information (PHI) in documents. However, the difficulty lies in the fact that there are few publicly annotated PHI documents. To solve this problem, in this study, we propose a filtered bidirectional encoder...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-04-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/11/8/3668 |
Summary: | For the secondary use of clinical documents, it is necessary to de-identify protected health information (PHI) in documents. However, the difficulty lies in the fact that there are few publicly annotated PHI documents. To solve this problem, in this study, we propose a filtered bidirectional encoder representation from transformers (BERT)-based method that predicts a masked word and validates the word again through a similarity filter to construct augmented sentences. The proposed method effectively performs data augmentation. The results show that the augmentation method based on filtered BERT improved the performance of the model. This suggests that our method can effectively improve the performance of the model in the limited data environment. |
---|---|
ISSN: | 2076-3417 |