Compressed representation of XML documents with rapid navigation

XML (Extensible Markup Language) is a language used in data representation and storage, and transmission and manipulation of data. Excessive memory consumption is an important challenge when representing XML documents in main memory. Document Object Model (DOM) APIs are used in a processing level th...

Full description

Bibliographic Details
Main Author: Kharabsheh, Mohammad Kamel Ahmad
Other Authors: Raman, Rajeev; Thomas, Richard
Published: University of Leicester 2014
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.617664
id ndltd-bl.uk-oai-ethos.bl.uk-617664
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-6176642016-08-04T04:00:38ZCompressed representation of XML documents with rapid navigationKharabsheh, Mohammad Kamel AhmadRaman, Rajeev; Thomas, Richard2014XML (Extensible Markup Language) is a language used in data representation and storage, and transmission and manipulation of data. Excessive memory consumption is an important challenge when representing XML documents in main memory. Document Object Model (DOM) APIs are used in a processing level that provides access to all parts of XML documents through the navigation operations. Although DOM serves as a a general purpose tool that can be used in different applications, it has high memory cost particularly if using naïve. The space usage of DOM has been reduced significantly while keeping fast processing speeds, by use of succinct data structures in SiXDOM [1]. However, SiXDOM does not explore in depth XML data compression principles to improve in-memory space usage. Such XML data compression techniques have been proven to be very effective in on-disk compression of XML document. In this thesis we propose a new approach to represent XML documents in-memory using XML data compression ideas to further reduce space usage while rapidly supporting operations of the kind supported by DOM. Our approach is based upon a compression method [2] which represents an XML document as a directed acyclic graph (DAG) by sharing common subtrees. However, this approach does not permit the representation of attributes and textual data, and furthermore, a naive implementation of this idea gives very poor space usage relative to other space-efficient DOM implementations [1]. In order to realise the potential of this compression method as an in-memory representation, a number of optimisations are made by application of succinct data structures and variablelength encoding. Furthermore, a framework for supporting attribute and textual data nodes is introduced. Finally, we propose a novel approach to representing the textual data using Minimal Perfect Hashing(MPH). We have implemented our ideas in a software library called DAGDOMand performed extensive experimental evaluation on a number of standard XML files. DAGDOM yields a good result and we are able to obtain significant space reductions over existing space-efficient DOM implementations (typically 2 to 5 times space reduction), with very modest degradations in CPU time for navigational operations.006.7University of Leicesterhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.617664http://hdl.handle.net/2381/29062Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 006.7
spellingShingle 006.7
Kharabsheh, Mohammad Kamel Ahmad
Compressed representation of XML documents with rapid navigation
description XML (Extensible Markup Language) is a language used in data representation and storage, and transmission and manipulation of data. Excessive memory consumption is an important challenge when representing XML documents in main memory. Document Object Model (DOM) APIs are used in a processing level that provides access to all parts of XML documents through the navigation operations. Although DOM serves as a a general purpose tool that can be used in different applications, it has high memory cost particularly if using naïve. The space usage of DOM has been reduced significantly while keeping fast processing speeds, by use of succinct data structures in SiXDOM [1]. However, SiXDOM does not explore in depth XML data compression principles to improve in-memory space usage. Such XML data compression techniques have been proven to be very effective in on-disk compression of XML document. In this thesis we propose a new approach to represent XML documents in-memory using XML data compression ideas to further reduce space usage while rapidly supporting operations of the kind supported by DOM. Our approach is based upon a compression method [2] which represents an XML document as a directed acyclic graph (DAG) by sharing common subtrees. However, this approach does not permit the representation of attributes and textual data, and furthermore, a naive implementation of this idea gives very poor space usage relative to other space-efficient DOM implementations [1]. In order to realise the potential of this compression method as an in-memory representation, a number of optimisations are made by application of succinct data structures and variablelength encoding. Furthermore, a framework for supporting attribute and textual data nodes is introduced. Finally, we propose a novel approach to representing the textual data using Minimal Perfect Hashing(MPH). We have implemented our ideas in a software library called DAGDOMand performed extensive experimental evaluation on a number of standard XML files. DAGDOM yields a good result and we are able to obtain significant space reductions over existing space-efficient DOM implementations (typically 2 to 5 times space reduction), with very modest degradations in CPU time for navigational operations.
author2 Raman, Rajeev; Thomas, Richard
author_facet Raman, Rajeev; Thomas, Richard
Kharabsheh, Mohammad Kamel Ahmad
author Kharabsheh, Mohammad Kamel Ahmad
author_sort Kharabsheh, Mohammad Kamel Ahmad
title Compressed representation of XML documents with rapid navigation
title_short Compressed representation of XML documents with rapid navigation
title_full Compressed representation of XML documents with rapid navigation
title_fullStr Compressed representation of XML documents with rapid navigation
title_full_unstemmed Compressed representation of XML documents with rapid navigation
title_sort compressed representation of xml documents with rapid navigation
publisher University of Leicester
publishDate 2014
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.617664
work_keys_str_mv AT kharabshehmohammadkamelahmad compressedrepresentationofxmldocumentswithrapidnavigation
_version_ 1718372272122626048