The Subword‐Character Multi‐Scale Transformer With Learnable Positional Encoding for Machine Translation

ABSTRACT The transformer model addresses the efficiency bottleneck caused by sequential computation in traditional recurrent neural networks (RNN) by leveraging the self‐attention mechanism to parallelize the capture of global dependencies. The subword‐level modeling units and fixed‐pattern position...

Full description

Bibliographic Details
Published in:	Engineering Reports
Main Authors:	Wenjing Yao, Wei Zhou
Format:	Article
Language:	English
Published:	Wiley 2025-07-01
Subjects:	fine‐grained features learnable positional encoding machine translation multi‐scale transformer
Online Access:	https://doi.org/10.1002/eng2.70287

Description
Summary:	ABSTRACT The transformer model addresses the efficiency bottleneck caused by sequential computation in traditional recurrent neural networks (RNN) by leveraging the self‐attention mechanism to parallelize the capture of global dependencies. The subword‐level modeling units and fixed‐pattern positional encoding adopted by mainstream methods struggle to adequately capture fine‐grained feature information in morphologically rich languages, limiting the model's flexible learning of target‐side word order patterns. To address these challenges, this study innovatively constructs a subword‐character multi‐scale transformer architecture integrated with a learnable positional encoding mechanism. The model abandons traditional fixed‐pattern positional encodings, enabling autonomous optimization of the positional representation space for source and target languages through end‐to‐end training mechanisms, significantly enhancing dynamic adaptability in cross‐linguistic positional mapping. While preserving the global semantic modeling advantages of subword units, the framework introduces a lightweight‐designed character‐level branch to supplement fine‐grained features. For the fusion of subword and character branches, we employ context‐aware cross‐attention to enable dynamic integration of linguistic information at different granularities. Our model achieves notable improvements in BLEU scores on the WMT'14 English‐German (En‐De), WMT'17 Chinese‐English (Zh‐En), and WMT'16 English‐Romanian (En‐Ro) benchmark tasks. These results demonstrate the synergistic effects of fine‐grained multi‐scale modeling and learnable positional encoding in enhancing translation quality and linguistic adaptability.
ISSN:	2577-8196

The Subword‐Character Multi‐Scale Transformer With Learnable Positional Encoding for Machine Translation

Similar Items