|
|
|
|
LEADER |
03411nam a2200697Ia 4500 |
001 |
10.1109-JBHI.2021.3062322 |
008 |
220427s2021 CNT 000 0 und d |
020 |
|
|
|a 21682194 (ISSN)
|
245 |
1 |
0 |
|a Limitations of Transformers on Clinical Text Classification
|
260 |
|
0 |
|b Institute of Electrical and Electronics Engineers Inc.
|c 2021
|
856 |
|
|
|z View Fulltext in Publisher
|u https://doi.org/10.1109/JBHI.2021.3062322
|
520 |
3 |
|
|a Bidirectional Encoder Representations from Transformers (BERT) and BERT-based approaches are the current state-of-the-art in many natural language processing (NLP) tasks; however, their application to document classification on long clinical texts is limited. In this work, we introduce four methods to scale BERT, which by default can only handle input sequences up to approximately 400 words long, to perform document classification on clinical texts several thousand words long. We compare these methods against two much simpler architectures - a word-level convolutional neural network and a hierarchical self-attention network - and show that BERT often cannot beat these simpler baselines when classifying MIMIC-III discharge summaries and SEER cancer pathology reports. In our analysis, we show that two key components of BERT - pretraining and WordPiece tokenization - may actually be inhibiting BERT's performance on clinical text classification tasks where the input document is several thousand words long and where correctly identifying labels may depend more on identifying a few key words or phrases rather than understanding the contextual meaning of sequences of text. © 2013 IEEE.
|
650 |
0 |
4 |
|a Article
|
650 |
0 |
4 |
|a artificial neural network
|
650 |
0 |
4 |
|a attention network
|
650 |
0 |
4 |
|a BERT
|
650 |
0 |
4 |
|a bidirectional encoder representations from transformer
|
650 |
0 |
4 |
|a Classification (of information)
|
650 |
0 |
4 |
|a clinical text
|
650 |
0 |
4 |
|a Clinical text classifications
|
650 |
0 |
4 |
|a convolutional neural network
|
650 |
0 |
4 |
|a Convolutional neural networks
|
650 |
0 |
4 |
|a deep learning
|
650 |
0 |
4 |
|a deep learning
|
650 |
0 |
4 |
|a Discharge summary
|
650 |
0 |
4 |
|a Document Classification
|
650 |
0 |
4 |
|a histology
|
650 |
0 |
4 |
|a human
|
650 |
0 |
4 |
|a Humans
|
650 |
0 |
4 |
|a ICD-9
|
650 |
0 |
4 |
|a Information retrieval systems
|
650 |
0 |
4 |
|a Input sequence
|
650 |
0 |
4 |
|a learning algorithm
|
650 |
0 |
4 |
|a machine learning
|
650 |
0 |
4 |
|a mathematical model
|
650 |
0 |
4 |
|a natural language processing
|
650 |
0 |
4 |
|a natural language processing
|
650 |
0 |
4 |
|a Natural Language Processing
|
650 |
0 |
4 |
|a NAtural language processing
|
650 |
0 |
4 |
|a Natural language processing systems
|
650 |
0 |
4 |
|a neural networks
|
650 |
0 |
4 |
|a Neural Networks, Computer
|
650 |
0 |
4 |
|a Pre-training
|
650 |
0 |
4 |
|a signal noise ratio
|
650 |
0 |
4 |
|a State of the art
|
650 |
0 |
4 |
|a text classification
|
650 |
0 |
4 |
|a Text processing
|
650 |
0 |
4 |
|a Tokenization
|
700 |
1 |
|
|a Alawad, M.
|e author
|
700 |
1 |
|
|a Coyle, L.
|e author
|
700 |
1 |
|
|a Doherty, J.
|e author
|
700 |
1 |
|
|a Durbin, E.B.
|e author
|
700 |
1 |
|
|a Gao, S.
|e author
|
700 |
1 |
|
|a Gounley, J.
|e author
|
700 |
1 |
|
|a Schaefferkoetter, N.
|e author
|
700 |
1 |
|
|a Stroup, A.
|e author
|
700 |
1 |
|
|a Tourassi, G.
|e author
|
700 |
1 |
|
|a Wu, X.-C.
|e author
|
700 |
1 |
|
|a Yoon, H.J.
|e author
|
700 |
1 |
|
|a Young, M.T.
|e author
|
773 |
|
|
|t IEEE Journal of Biomedical and Health Informatics
|