Duplicate Detection and Text Classification on Simplified Technical English
This thesis investigates the most effective way of performing classification of text labels and clustering of duplicate texts in technical documentation written in Simplified Technical English. Pre-trained language models from transformers (BERT) were tested against traditional methods such as tf-id...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Linköpings universitet, Institutionen för datavetenskap
2019
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-158714 |