Vyhledávání v českých strukturovaných datech pomocí stemmingu

This work describes and implements a component for fulltext searching with czech diacritics restoration and stemming support. Diacritics restoration is based on statistical principles and is context dependent. This work presents ve stemmers ready for immediate use (two algorithmic stemmers and three...

Full description

Bibliographic Details
Main Author:	Tattermusch, Jan
Other Authors:	Hlaváčová, Jaroslava
Format:	Dissertation
Language:	Czech
Published:	2010
Online Access:	http://www.nusl.cz/ntk/nusl-298466

id	ndltd-nusl.cz-oai-invenio.nusl.cz-298466
record_format	oai_dc
spelling	ndltd-nusl.cz-oai-invenio.nusl.cz-2984662017-06-27T04:42:44Z Vyhledávání v českých strukturovaných datech pomocí stemmingu Searching Czech Structured Data using Stemming Tattermusch, Jan Hlaváčová, Jaroslava Kuboň, Vladislav This work describes and implements a component for fulltext searching with czech diacritics restoration and stemming support. Diacritics restoration is based on statistical principles and is context dependent. This work presents ve stemmers ready for immediate use (two algorithmic stemmers and three hybrid stemmers) and discusses their properties. The component is implemented using Apache Lucene library and provides a simple interface for querying and insertions, deletions and updates of documents indexed. Stored documents consist of named elds with prede ned data types. Besides regular fulltext queries, the component also supports non-trivial queries with additional constraints and provides a way to customize the way query result score is computed. Component's performance is suffcient for medium-load applications and is approximately 50 queries per second with a repository that contains 2.7 million documents. Contribution of stemming and diacritics restoration to the quality of fulltext searching was measured using MAP and is signi cant. 2010 info:eu-repo/semantics/masterThesis http://www.nusl.cz/ntk/nusl-298466 cze info:eu-repo/semantics/restrictedAccess
collection	NDLTD
language	Czech
format	Dissertation
sources	NDLTD
description	This work describes and implements a component for fulltext searching with czech diacritics restoration and stemming support. Diacritics restoration is based on statistical principles and is context dependent. This work presents ve stemmers ready for immediate use (two algorithmic stemmers and three hybrid stemmers) and discusses their properties. The component is implemented using Apache Lucene library and provides a simple interface for querying and insertions, deletions and updates of documents indexed. Stored documents consist of named elds with prede ned data types. Besides regular fulltext queries, the component also supports non-trivial queries with additional constraints and provides a way to customize the way query result score is computed. Component's performance is suffcient for medium-load applications and is approximately 50 queries per second with a repository that contains 2.7 million documents. Contribution of stemming and diacritics restoration to the quality of fulltext searching was measured using MAP and is signi cant.
author2	Hlaváčová, Jaroslava
author_facet	Hlaváčová, Jaroslava Tattermusch, Jan
author	Tattermusch, Jan
spellingShingle	Tattermusch, Jan Vyhledávání v českých strukturovaných datech pomocí stemmingu
author_sort	Tattermusch, Jan
title	Vyhledávání v českých strukturovaných datech pomocí stemmingu
title_short	Vyhledávání v českých strukturovaných datech pomocí stemmingu
title_full	Vyhledávání v českých strukturovaných datech pomocí stemmingu
title_fullStr	Vyhledávání v českých strukturovaných datech pomocí stemmingu
title_full_unstemmed	Vyhledávání v českých strukturovaných datech pomocí stemmingu
title_sort	vyhledávání v českých strukturovaných datech pomocí stemmingu
publishDate	2010
url	http://www.nusl.cz/ntk/nusl-298466
work_keys_str_mv	AT tattermuschjan vyhledavanivceskychstrukturovanychdatechpomocistemmingu AT tattermuschjan searchingczechstructureddatausingstemming
_version_	1718471373415776256

Vyhledávání v českých strukturovaných datech pomocí stemmingu

Similar Items