Tag Generalization For Facet-Based Search

In this project we address over-specification of tags, a common problem of modern tag-based document management systems. In such systems tags are essential for the document retrieval task. The accuracy of this process depends mainly on the “human factor” i.e. the quality of tags assigned by users. W...

Full description

Bibliographic Details
Main Author: Niewiarowski, Tomasz
Language:en_US
Published: 2013
Online Access:http://hdl.handle.net/10222/36235
Description
Summary:In this project we address over-specification of tags, a common problem of modern tag-based document management systems. In such systems tags are essential for the document retrieval task. The accuracy of this process depends mainly on the “human factor” i.e. the quality of tags assigned by users. While tagging, users are likely to pick only very specific tags that describe the content of a resource, forgetting about general concepts that represent the resource. Our proposed method to deal with this problem is an automatic tag generalization algorithm which assigns general tags to newly tagged resources. The objective of the algorithm is to provide a layer of tags consisting of general concepts and to provide good support for a system user. The proposed method automatically tags resources with more general and similar tags to user-assigned tags. The method is unsupervised and domain independent. The proposed tag generalization method consists of three stages: (1) the disambiguation and concept mapping stage maps specific tags to Wikipedia articles representing the same concept; (2) link based tag generalization is meant to find similar and more general articles using the Wikipedia link structure; (3) the concept unification stage where the system assigns tags based on the list of general articles. Evaluation on four real-life tag data sets demonstrates that the proposed method is domain independent and outperforms supervised tag recommendation systems for practical training data set sizes.