Building High-Quality Datasets for Information Retrieval Evaluation at a Reduced Cost

Information Retrieval is not any more exclusively about document ranking. Continuously new tasks are proposed on this and sibling fields. With this proliferation of tasks, it becomes crucial to have a cheap way of constructing test collections to evaluate the new developments. Building test collecti...

Full description

Bibliographic Details
Main Authors: David Otero, Daniel Valcarce, Javier Parapar, Álvaro Barreiro
Format: Article
Language:English
Published: MDPI AG 2019-08-01
Series:Proceedings
Subjects:
Online Access:https://www.mdpi.com/2504-3900/21/1/33
id doaj-038130998ecf4debbf707f0e9150ceb1
record_format Article
spelling doaj-038130998ecf4debbf707f0e9150ceb12020-11-24T21:38:51ZengMDPI AGProceedings2504-39002019-08-012113310.3390/proceedings2019021033proceedings2019021033Building High-Quality Datasets for Information Retrieval Evaluation at a Reduced CostDavid Otero0Daniel Valcarce1Javier Parapar2Álvaro Barreiro3Information Retrieval Lab, Centro de Investigación en Tecnoloxías da Información e as Comunicacións (CITIC), Universidade da Coruña, 15071 A Coruña, SpainInformation Retrieval Lab, Centro de Investigación en Tecnoloxías da Información e as Comunicacións (CITIC), Universidade da Coruña, 15071 A Coruña, SpainInformation Retrieval Lab, Centro de Investigación en Tecnoloxías da Información e as Comunicacións (CITIC), Universidade da Coruña, 15071 A Coruña, SpainInformation Retrieval Lab, Centro de Investigación en Tecnoloxías da Información e as Comunicacións (CITIC), Universidade da Coruña, 15071 A Coruña, SpainInformation Retrieval is not any more exclusively about document ranking. Continuously new tasks are proposed on this and sibling fields. With this proliferation of tasks, it becomes crucial to have a cheap way of constructing test collections to evaluate the new developments. Building test collections is time and resource consuming: it requires time to obtain the documents, to define the user needs and it requires the assessors to judge a lot of documents. To reduce the latest, pooling strategies aim to decrease the assessment effort by presenting to the assessors a sample of documents in the corpus with the maximum number of relevant documents in it. In this paper, we propose the preliminary design of different techniques to easily and cheapily build high-quality test collections without the need of having participants systems.https://www.mdpi.com/2504-3900/21/1/33information retrievalevaluationdatasetscost
collection DOAJ
language English
format Article
sources DOAJ
author David Otero
Daniel Valcarce
Javier Parapar
Álvaro Barreiro
spellingShingle David Otero
Daniel Valcarce
Javier Parapar
Álvaro Barreiro
Building High-Quality Datasets for Information Retrieval Evaluation at a Reduced Cost
Proceedings
information retrieval
evaluation
datasets
cost
author_facet David Otero
Daniel Valcarce
Javier Parapar
Álvaro Barreiro
author_sort David Otero
title Building High-Quality Datasets for Information Retrieval Evaluation at a Reduced Cost
title_short Building High-Quality Datasets for Information Retrieval Evaluation at a Reduced Cost
title_full Building High-Quality Datasets for Information Retrieval Evaluation at a Reduced Cost
title_fullStr Building High-Quality Datasets for Information Retrieval Evaluation at a Reduced Cost
title_full_unstemmed Building High-Quality Datasets for Information Retrieval Evaluation at a Reduced Cost
title_sort building high-quality datasets for information retrieval evaluation at a reduced cost
publisher MDPI AG
series Proceedings
issn 2504-3900
publishDate 2019-08-01
description Information Retrieval is not any more exclusively about document ranking. Continuously new tasks are proposed on this and sibling fields. With this proliferation of tasks, it becomes crucial to have a cheap way of constructing test collections to evaluate the new developments. Building test collections is time and resource consuming: it requires time to obtain the documents, to define the user needs and it requires the assessors to judge a lot of documents. To reduce the latest, pooling strategies aim to decrease the assessment effort by presenting to the assessors a sample of documents in the corpus with the maximum number of relevant documents in it. In this paper, we propose the preliminary design of different techniques to easily and cheapily build high-quality test collections without the need of having participants systems.
topic information retrieval
evaluation
datasets
cost
url https://www.mdpi.com/2504-3900/21/1/33
work_keys_str_mv AT davidotero buildinghighqualitydatasetsforinformationretrievalevaluationatareducedcost
AT danielvalcarce buildinghighqualitydatasetsforinformationretrievalevaluationatareducedcost
AT javierparapar buildinghighqualitydatasetsforinformationretrievalevaluationatareducedcost
AT alvarobarreiro buildinghighqualitydatasetsforinformationretrievalevaluationatareducedcost
_version_ 1725934132042661888