QPLaBSE: Quantized and Pruned Language-Agnostic BERT Sentence Embedding Model : Production-ready compression for multilingual transformers

Transformer models perform well on Natural Language Processing and Natural Language Understanding tasks. Training and fine-tuning of these models consume a large amount of data and computing resources. Fast inference also requires high-end hardware for user-facing products. While distillation, quant...

Full description

Bibliographic Details
Main Author: Langde, Sarthak
Format: Others
Language:English
Published: KTH, Skolan för elektroteknik och datavetenskap (EECS) 2021
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-305172