Random Forest Based Searching Approach for RDF

The blend of digital and physical worlds changed the Internet significantly. Accordingly, trends to collect, access, and deliver information have changed over the Web. Such changes raised the problems of information retrieval. Search engines retrieve requested information based on the provided keywo...

Full description

Bibliographic Details
Main Author: Hatem Soliman
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
RDF
Online Access:https://ieeexplore.ieee.org/document/9032149/
Description
Summary:The blend of digital and physical worlds changed the Internet significantly. Accordingly, trends to collect, access, and deliver information have changed over the Web. Such changes raised the problems of information retrieval. Search engines retrieve requested information based on the provided keywords which is not an efficient way for rich information retrieval. Consequently, the fetching of the required information is difficult without understanding the syntax and semantics of the content. The multiple existing approaches to resolve this problem by exploiting linked data and semantic Web techniques. Such approaches serialize the content leveraging the Resource Description Framework (RDF) and process the queries using SPARQL to resolve the problem. However, an exact match between RDF content and query structure is required. Although it improves the keyword-based search, it does not provide probabilistic reasoning to find the relationship accuracy between the query and results. In this perspective, this paper proposes a machine learning (random forest) based approach to predict the fetching status of RDF by treating RDFs' requests as a classification problem. First, we preprocess the RDF to convert them into N-Triples format. Then, a feature vector is constructed for each RDF using the preprocessed RDF. After that, a random forest classifier is trained for the prediction of the fetching status of RDFs. The proposed approach is evaluated on an open-source DBpedia dataset. The 10-fold cross-validation results indicate that the performance of the proposed approach is accurate and surpasses the state-of-the-art.
ISSN:2169-3536