Dokument-klynging (document clustering)
As document searching becomes more and more important with the rapid growth of document bases today, document clustering also becomes more important. Some of the most commonly used document clustering algorithms today, are pure statistical in nature. Other algorithms have emerged, adressing some of...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap
2008
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8868 |
id |
ndltd-UPSALLA1-oai-DiVA.org-ntnu-8868 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-ntnu-88682013-01-08T13:26:25ZDokument-klynging (document clustering)engGalåen, MagnusNorges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskapInstitutt for datateknikk og informasjonsvitenskap2008ntnudaimMIT informatikkKunstig intelligens og læringAs document searching becomes more and more important with the rapid growth of document bases today, document clustering also becomes more important. Some of the most commonly used document clustering algorithms today, are pure statistical in nature. Other algorithms have emerged, adressing some of the issues with numerical algorithms, claiming to be better. This thesis compares two well-known algorithms: Elliptic K-Means and Suffix Tree Clustering. They are compared in speed and quality, and it is shown that Elliptic K-Means performs better in speed, while Suffix Tree Clustering (STC) performs better in quality. It is further shown that STC performs better using small portions of relevant text (snippets) on real web-data compared to the full document. It is also shown that a threshold value for base cluster merging is unneccesary. As STC is shown to perform adequately in speed when running on snippets only, it is concluded that STC is the better algorithm for the purpose of search results clustering. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8868Local ntnudaim:1505application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
ntnudaim MIT informatikk Kunstig intelligens og læring |
spellingShingle |
ntnudaim MIT informatikk Kunstig intelligens og læring Galåen, Magnus Dokument-klynging (document clustering) |
description |
As document searching becomes more and more important with the rapid growth of document bases today, document clustering also becomes more important. Some of the most commonly used document clustering algorithms today, are pure statistical in nature. Other algorithms have emerged, adressing some of the issues with numerical algorithms, claiming to be better. This thesis compares two well-known algorithms: Elliptic K-Means and Suffix Tree Clustering. They are compared in speed and quality, and it is shown that Elliptic K-Means performs better in speed, while Suffix Tree Clustering (STC) performs better in quality. It is further shown that STC performs better using small portions of relevant text (snippets) on real web-data compared to the full document. It is also shown that a threshold value for base cluster merging is unneccesary. As STC is shown to perform adequately in speed when running on snippets only, it is concluded that STC is the better algorithm for the purpose of search results clustering. |
author |
Galåen, Magnus |
author_facet |
Galåen, Magnus |
author_sort |
Galåen, Magnus |
title |
Dokument-klynging (document clustering) |
title_short |
Dokument-klynging (document clustering) |
title_full |
Dokument-klynging (document clustering) |
title_fullStr |
Dokument-klynging (document clustering) |
title_full_unstemmed |
Dokument-klynging (document clustering) |
title_sort |
dokument-klynging (document clustering) |
publisher |
Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap |
publishDate |
2008 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8868 |
work_keys_str_mv |
AT galaenmagnus dokumentklyngingdocumentclustering |
_version_ |
1716520076662800384 |