Duplicate Detection and Text Classification on Simplified Technical English

This thesis investigates the most effective way of performing classification of text labels and clustering of duplicate texts in technical documentation written in Simplified Technical English. Pre-trained language models from transformers (BERT) were tested against traditional methods such as tf-id...

Full description

Bibliographic Details
Main Author:	Lund, Max
Format:	Others
Language:	English
Published:	Linköpings universitet, Institutionen för datavetenskap 2019
Subjects:	NLP CNL transformer models LSTM BERT document embeddings word embeddings text classification text clustering transfer learning machine learning Language Technology (Computational Linguistics) Språkteknologi (språkvetenskaplig databehandling) Computer Sciences Datavetenskap (datalogi)
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-158714

Internet

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-158714

Duplicate Detection and Text Classification on Simplified Technical English

Internet

Similar Items