Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory

This thesis aims at investigating a new approach to document analysis based on the idea of structural patterns in XML vocabularies. My work is founded on the belief that authors do naturally converge to a reasonable use of markup languages and that extreme, yet valid instances are rare and limited....

Full description

Bibliographic Details
Main Author: Poggi, Francesco <1982>
Other Authors: Ciancarini, Paolo
Format: Doctoral Thesis
Language:en
Published: Alma Mater Studiorum - Università di Bologna 2015
Subjects:
Online Access:http://amsdottorato.unibo.it/7123/
id ndltd-unibo.it-oai-amsdottorato.cib.unibo.it-7123
record_format oai_dc
spelling ndltd-unibo.it-oai-amsdottorato.cib.unibo.it-71232015-08-05T05:05:56Z Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory Poggi, Francesco <1982> INF/01 Informatica This thesis aims at investigating a new approach to document analysis based on the idea of structural patterns in XML vocabularies. My work is founded on the belief that authors do naturally converge to a reasonable use of markup languages and that extreme, yet valid instances are rare and limited. Actual documents, therefore, may be used to derive classes of elements (patterns) persisting across documents and distilling the conceptualization of the documents and their components, and may give ground for automatic tools and services that rely on no background information (such as schemas) at all. The central part of my work consists in introducing from the ground up a formal theory of eight structural patterns (with three sub-patterns) that are able to express the logical organization of any XML document, and verifying their identifiability in a number of different vocabularies. This model is characterized by and validated against three main dimensions: terseness (i.e. the ability to represent the structure of a document with a small number of objects and composition rules), coverage (i.e. the ability to capture any possible situation in any document) and expressiveness (i.e. the ability to make explicit the semantics of structures, relations and dependencies). An algorithm for the automatic recognition of structural patterns is then presented, together with an evaluation of the results of a test performed on a set of more than 1100 documents from eight very different vocabularies. This language-independent analysis confirms the ability of patterns to capture and summarize the guidelines used by the authors in their everyday practice. Finally, I present some systems that work directly on the pattern-based representation of documents. The ability of these tools to cover very different situations and contexts confirms the effectiveness of the model. Alma Mater Studiorum - Università di Bologna Ciancarini, Paolo 2015-06-04 Doctoral Thesis PeerReviewed application/pdf en http://amsdottorato.unibo.it/7123/ info:eu-repo/semantics/openAccess
collection NDLTD
language en
format Doctoral Thesis
sources NDLTD
topic INF/01 Informatica
spellingShingle INF/01 Informatica
Poggi, Francesco <1982>
Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory
description This thesis aims at investigating a new approach to document analysis based on the idea of structural patterns in XML vocabularies. My work is founded on the belief that authors do naturally converge to a reasonable use of markup languages and that extreme, yet valid instances are rare and limited. Actual documents, therefore, may be used to derive classes of elements (patterns) persisting across documents and distilling the conceptualization of the documents and their components, and may give ground for automatic tools and services that rely on no background information (such as schemas) at all. The central part of my work consists in introducing from the ground up a formal theory of eight structural patterns (with three sub-patterns) that are able to express the logical organization of any XML document, and verifying their identifiability in a number of different vocabularies. This model is characterized by and validated against three main dimensions: terseness (i.e. the ability to represent the structure of a document with a small number of objects and composition rules), coverage (i.e. the ability to capture any possible situation in any document) and expressiveness (i.e. the ability to make explicit the semantics of structures, relations and dependencies). An algorithm for the automatic recognition of structural patterns is then presented, together with an evaluation of the results of a test performed on a set of more than 1100 documents from eight very different vocabularies. This language-independent analysis confirms the ability of patterns to capture and summarize the guidelines used by the authors in their everyday practice. Finally, I present some systems that work directly on the pattern-based representation of documents. The ability of these tools to cover very different situations and contexts confirms the effectiveness of the model.
author2 Ciancarini, Paolo
author_facet Ciancarini, Paolo
Poggi, Francesco <1982>
author Poggi, Francesco <1982>
author_sort Poggi, Francesco <1982>
title Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory
title_short Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory
title_full Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory
title_fullStr Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory
title_full_unstemmed Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory
title_sort structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory
publisher Alma Mater Studiorum - Università di Bologna
publishDate 2015
url http://amsdottorato.unibo.it/7123/
work_keys_str_mv AT poggifrancesco1982 structuralpatternsfordocumentengineeringfromanempiricalbottomupanalysistoanontologicaltheory
_version_ 1716816162714550272