Dokument-klynging (document clustering)

As document searching becomes more and more important with the rapid growth of document bases today, document clustering also becomes more important. Some of the most commonly used document clustering algorithms today, are pure statistical in nature. Other algorithms have emerged, adressing some of...

Full description

Bibliographic Details
Main Author: Galåen, Magnus
Format: Others
Language:English
Published: Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap 2008
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8868
id ndltd-UPSALLA1-oai-DiVA.org-ntnu-8868
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-ntnu-88682013-01-08T13:26:25ZDokument-klynging (document clustering)engGalåen, MagnusNorges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskapInstitutt for datateknikk og informasjonsvitenskap2008ntnudaimMIT informatikkKunstig intelligens og læringAs document searching becomes more and more important with the rapid growth of document bases today, document clustering also becomes more important. Some of the most commonly used document clustering algorithms today, are pure statistical in nature. Other algorithms have emerged, adressing some of the issues with numerical algorithms, claiming to be better. This thesis compares two well-known algorithms: Elliptic K-Means and Suffix Tree Clustering. They are compared in speed and quality, and it is shown that Elliptic K-Means performs better in speed, while Suffix Tree Clustering (STC) performs better in quality. It is further shown that STC performs better using small portions of relevant text (snippets) on real web-data compared to the full document. It is also shown that a threshold value for base cluster merging is unneccesary. As STC is shown to perform adequately in speed when running on snippets only, it is concluded that STC is the better algorithm for the purpose of search results clustering. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8868Local ntnudaim:1505application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic ntnudaim
MIT informatikk
Kunstig intelligens og læring
spellingShingle ntnudaim
MIT informatikk
Kunstig intelligens og læring
Galåen, Magnus
Dokument-klynging (document clustering)
description As document searching becomes more and more important with the rapid growth of document bases today, document clustering also becomes more important. Some of the most commonly used document clustering algorithms today, are pure statistical in nature. Other algorithms have emerged, adressing some of the issues with numerical algorithms, claiming to be better. This thesis compares two well-known algorithms: Elliptic K-Means and Suffix Tree Clustering. They are compared in speed and quality, and it is shown that Elliptic K-Means performs better in speed, while Suffix Tree Clustering (STC) performs better in quality. It is further shown that STC performs better using small portions of relevant text (snippets) on real web-data compared to the full document. It is also shown that a threshold value for base cluster merging is unneccesary. As STC is shown to perform adequately in speed when running on snippets only, it is concluded that STC is the better algorithm for the purpose of search results clustering.
author Galåen, Magnus
author_facet Galåen, Magnus
author_sort Galåen, Magnus
title Dokument-klynging (document clustering)
title_short Dokument-klynging (document clustering)
title_full Dokument-klynging (document clustering)
title_fullStr Dokument-klynging (document clustering)
title_full_unstemmed Dokument-klynging (document clustering)
title_sort dokument-klynging (document clustering)
publisher Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap
publishDate 2008
url http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8868
work_keys_str_mv AT galaenmagnus dokumentklyngingdocumentclustering
_version_ 1716520076662800384