Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora

Despite the accessibility of numerous online corpora, students and researchers engaged in the fields of Natural Language Processing (NLP), corpus linguistics, and language learning and teaching may encounter situations in which they need to develop their own corpora. Several commercial and free stan...

Full description

Bibliographic Details
Main Authors: Abdulmohsen Al-Thubaity, Hend Al-Khalifa, Reem Alqifari, Manal Almazrua
Format: Article
Language:English
Published: Hindawi Limited 2014-01-01
Series:The Scientific World Journal
Online Access:http://dx.doi.org/10.1155/2014/602745
id doaj-c716a1905eb54ff09abf467a9b113f3f
record_format Article
spelling doaj-c716a1905eb54ff09abf467a9b113f3f2020-11-25T02:15:24ZengHindawi LimitedThe Scientific World Journal2356-61401537-744X2014-01-01201410.1155/2014/602745602745Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic CorporaAbdulmohsen Al-Thubaity0Hend Al-Khalifa1Reem Alqifari2Manal Almazrua3King AbdulAziz City for Science and Technology, Riyadh 11442, Saudi ArabiaKing Saud University, Riyadh 12372, Saudi ArabiaKing Saud University, Riyadh 12372, Saudi ArabiaKing AbdulAziz City for Science and Technology, Riyadh 11442, Saudi ArabiaDespite the accessibility of numerous online corpora, students and researchers engaged in the fields of Natural Language Processing (NLP), corpus linguistics, and language learning and teaching may encounter situations in which they need to develop their own corpora. Several commercial and free standalone corpora processing systems are available to process such corpora. In this study, we first propose a framework for the evaluation of standalone corpora processing systems and then use it to evaluate seven freely available systems. The proposed framework considers the usability, functionality, and performance of the evaluated systems while taking into consideration their suitability for Arabic corpora. While the results show that most of the evaluated systems exhibited comparable usability scores, the scores for functionality and performance were substantially different with respect to support for the Arabic language and N-grams profile generation. The results of our evaluation will help potential users of the evaluated systems to choose the system that best meets their needs. More importantly, the results will help the developers of the evaluated systems to enhance their systems and developers of new corpora processing systems by providing them with a reference framework.http://dx.doi.org/10.1155/2014/602745
collection DOAJ
language English
format Article
sources DOAJ
author Abdulmohsen Al-Thubaity
Hend Al-Khalifa
Reem Alqifari
Manal Almazrua
spellingShingle Abdulmohsen Al-Thubaity
Hend Al-Khalifa
Reem Alqifari
Manal Almazrua
Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora
The Scientific World Journal
author_facet Abdulmohsen Al-Thubaity
Hend Al-Khalifa
Reem Alqifari
Manal Almazrua
author_sort Abdulmohsen Al-Thubaity
title Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora
title_short Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora
title_full Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora
title_fullStr Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora
title_full_unstemmed Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora
title_sort proposed framework for the evaluation of standalone corpora processing systems: an application to arabic corpora
publisher Hindawi Limited
series The Scientific World Journal
issn 2356-6140
1537-744X
publishDate 2014-01-01
description Despite the accessibility of numerous online corpora, students and researchers engaged in the fields of Natural Language Processing (NLP), corpus linguistics, and language learning and teaching may encounter situations in which they need to develop their own corpora. Several commercial and free standalone corpora processing systems are available to process such corpora. In this study, we first propose a framework for the evaluation of standalone corpora processing systems and then use it to evaluate seven freely available systems. The proposed framework considers the usability, functionality, and performance of the evaluated systems while taking into consideration their suitability for Arabic corpora. While the results show that most of the evaluated systems exhibited comparable usability scores, the scores for functionality and performance were substantially different with respect to support for the Arabic language and N-grams profile generation. The results of our evaluation will help potential users of the evaluated systems to choose the system that best meets their needs. More importantly, the results will help the developers of the evaluated systems to enhance their systems and developers of new corpora processing systems by providing them with a reference framework.
url http://dx.doi.org/10.1155/2014/602745
work_keys_str_mv AT abdulmohsenalthubaity proposedframeworkfortheevaluationofstandalonecorporaprocessingsystemsanapplicationtoarabiccorpora
AT hendalkhalifa proposedframeworkfortheevaluationofstandalonecorporaprocessingsystemsanapplicationtoarabiccorpora
AT reemalqifari proposedframeworkfortheevaluationofstandalonecorporaprocessingsystemsanapplicationtoarabiccorpora
AT manalalmazrua proposedframeworkfortheevaluationofstandalonecorporaprocessingsystemsanapplicationtoarabiccorpora
_version_ 1724896620123783168