Scalable Preservation, Reconstruction, and Querying of Databases in terms of Semantic Web Representations

This Thesis addresses how Semantic Web representations, in particular RDF, can enable flexible and scalable preservation, recreation, and querying of databases. An approach has been developed for selective scalable long-term archival of relational databases (RDBs) as RDF, implemented in the SAQ (Sem...

Full description

Bibliographic Details
Main Author: Stefanova, Silvia
Format: Doctoral Thesis
Language:English
Published: Uppsala universitet, Avdelningen för datalogi 2013
Subjects:
RDF
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-199573
http://nbn-resolving.de/urn:isbn:978-91-554-8690-7
id ndltd-UPSALLA1-oai-DiVA.org-uu-199573
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-uu-1995732014-07-22T05:19:22ZScalable Preservation, Reconstruction, and Querying of Databases in terms of Semantic Web RepresentationsengStefanova, SilviaUppsala universitet, Avdelningen för datalogiUppsala universitet, DatalogiUppsala2013RDFRDFSRDF viewSPARQLSPARQL query processingrewrite optimizationTopic Mapsquerying of RDF viewsarchive relational databasesreconstruct archived databasesThis Thesis addresses how Semantic Web representations, in particular RDF, can enable flexible and scalable preservation, recreation, and querying of databases. An approach has been developed for selective scalable long-term archival of relational databases (RDBs) as RDF, implemented in the SAQ (Semantic Archive and Query) system. The archival of user-specified parts of an RDB is specified using an extension of SPARQL, A-SPARQL. SAQ automatically generates an RDF view of the RDB, the RD-view. The result of an archival query is RDF triples stored in: i) a data archive file containing the preserved RDB content, and ii) a schema archive file containing sufficient meta-data to reconstruct the archived database. To achieve scalable data preservation and recreation, SAQ uses special query rewriting optimizations for the archival queries. It was experimentally shown that they improve query execution and archival time compared with naïve processing. The performance of SAQ was compared with that of other systems supporting SPARQL queries to views of existing RDBs. When an archived RDB is to be recreated, the reloader module of SAQ first reads the schema archive file and executes a schema reconstruction algorithm to automatically construct the RDB schema. The thus created RDB is populated by reading the data archive and converting the read data into relational attribute values. For scalable recreation of RDF archived data we have developed the Triple Bulk Load (TBL) approach where the relational data is reconstructed by using the bulk load facility of the RDBMS. Our experiments show that the TBL approach is substantially faster than the naïve Insert Attribute Value (IAV) approach, despite the added sorting and post-processing. To view and query semi-structured Topic Maps data as RDF the prototype system TM-Viewer was implemented. A declarative RDF view of Topic Maps, the TM-view, is automatically generated by the TM-viewer using a developed conceptual schema for the Topic Maps data model. To achieve efficient query processing of SPARQL queries to the TM-view query rewrite transformations were developed and evaluated. It was shown that they significantly improve the query execution time. eSSENCEDoctoral thesis, comprehensive summaryinfo:eu-repo/semantics/doctoralThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-199573urn:isbn:978-91-554-8690-7Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, 1651-6214 ; 1052application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Doctoral Thesis
sources NDLTD
topic RDF
RDFS
RDF view
SPARQL
SPARQL query processing
rewrite optimization
Topic Maps
querying of RDF views
archive relational databases
reconstruct archived databases
spellingShingle RDF
RDFS
RDF view
SPARQL
SPARQL query processing
rewrite optimization
Topic Maps
querying of RDF views
archive relational databases
reconstruct archived databases
Stefanova, Silvia
Scalable Preservation, Reconstruction, and Querying of Databases in terms of Semantic Web Representations
description This Thesis addresses how Semantic Web representations, in particular RDF, can enable flexible and scalable preservation, recreation, and querying of databases. An approach has been developed for selective scalable long-term archival of relational databases (RDBs) as RDF, implemented in the SAQ (Semantic Archive and Query) system. The archival of user-specified parts of an RDB is specified using an extension of SPARQL, A-SPARQL. SAQ automatically generates an RDF view of the RDB, the RD-view. The result of an archival query is RDF triples stored in: i) a data archive file containing the preserved RDB content, and ii) a schema archive file containing sufficient meta-data to reconstruct the archived database. To achieve scalable data preservation and recreation, SAQ uses special query rewriting optimizations for the archival queries. It was experimentally shown that they improve query execution and archival time compared with naïve processing. The performance of SAQ was compared with that of other systems supporting SPARQL queries to views of existing RDBs. When an archived RDB is to be recreated, the reloader module of SAQ first reads the schema archive file and executes a schema reconstruction algorithm to automatically construct the RDB schema. The thus created RDB is populated by reading the data archive and converting the read data into relational attribute values. For scalable recreation of RDF archived data we have developed the Triple Bulk Load (TBL) approach where the relational data is reconstructed by using the bulk load facility of the RDBMS. Our experiments show that the TBL approach is substantially faster than the naïve Insert Attribute Value (IAV) approach, despite the added sorting and post-processing. To view and query semi-structured Topic Maps data as RDF the prototype system TM-Viewer was implemented. A declarative RDF view of Topic Maps, the TM-view, is automatically generated by the TM-viewer using a developed conceptual schema for the Topic Maps data model. To achieve efficient query processing of SPARQL queries to the TM-view query rewrite transformations were developed and evaluated. It was shown that they significantly improve the query execution time. === eSSENCE
author Stefanova, Silvia
author_facet Stefanova, Silvia
author_sort Stefanova, Silvia
title Scalable Preservation, Reconstruction, and Querying of Databases in terms of Semantic Web Representations
title_short Scalable Preservation, Reconstruction, and Querying of Databases in terms of Semantic Web Representations
title_full Scalable Preservation, Reconstruction, and Querying of Databases in terms of Semantic Web Representations
title_fullStr Scalable Preservation, Reconstruction, and Querying of Databases in terms of Semantic Web Representations
title_full_unstemmed Scalable Preservation, Reconstruction, and Querying of Databases in terms of Semantic Web Representations
title_sort scalable preservation, reconstruction, and querying of databases in terms of semantic web representations
publisher Uppsala universitet, Avdelningen för datalogi
publishDate 2013
url http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-199573
http://nbn-resolving.de/urn:isbn:978-91-554-8690-7
work_keys_str_mv AT stefanovasilvia scalablepreservationreconstructionandqueryingofdatabasesintermsofsemanticwebrepresentations
_version_ 1716708659692568576