On the Softmax Bottleneck of Word-Level Recurrent Language Models
For different input contexts (sequence of previous words), to predict the next word, a neural word-level language model outputs a probability distribution over all the words in the vocabulary using a softmax function. When the log of probability outputs for all such contexts are stacked together, th...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | en |
Published: |
Université d'Ottawa / University of Ottawa
2020
|
Subjects: | |
Online Access: | http://hdl.handle.net/10393/41412 http://dx.doi.org/10.20381/ruor-25636 |