Duplicate Detection and Text Classification on Simplified Technical English

This thesis investigates the most effective way of performing classification of text labels and clustering of duplicate texts in technical documentation written in Simplified Technical English. Pre-trained language models from transformers (BERT) were tested against traditional methods such as tf-id...

Full description

Bibliographic Details
Main Author: Lund, Max
Format: Others
Language:English
Published: Linköpings universitet, Institutionen för datavetenskap 2019
Subjects:
NLP
CNL
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-158714