Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory
This thesis aims at investigating a new approach to document analysis based on the idea of structural patterns in XML vocabularies. My work is founded on the belief that authors do naturally converge to a reasonable use of markup languages and that extreme, yet valid instances are rare and limited....
Main Author: | |
---|---|
Other Authors: | |
Format: | Doctoral Thesis |
Language: | en |
Published: |
Alma Mater Studiorum - Università di Bologna
2015
|
Subjects: | |
Online Access: | http://amsdottorato.unibo.it/7123/ |
id |
ndltd-unibo.it-oai-amsdottorato.cib.unibo.it-7123 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-unibo.it-oai-amsdottorato.cib.unibo.it-71232015-08-05T05:05:56Z Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory Poggi, Francesco <1982> INF/01 Informatica This thesis aims at investigating a new approach to document analysis based on the idea of structural patterns in XML vocabularies. My work is founded on the belief that authors do naturally converge to a reasonable use of markup languages and that extreme, yet valid instances are rare and limited. Actual documents, therefore, may be used to derive classes of elements (patterns) persisting across documents and distilling the conceptualization of the documents and their components, and may give ground for automatic tools and services that rely on no background information (such as schemas) at all. The central part of my work consists in introducing from the ground up a formal theory of eight structural patterns (with three sub-patterns) that are able to express the logical organization of any XML document, and verifying their identifiability in a number of different vocabularies. This model is characterized by and validated against three main dimensions: terseness (i.e. the ability to represent the structure of a document with a small number of objects and composition rules), coverage (i.e. the ability to capture any possible situation in any document) and expressiveness (i.e. the ability to make explicit the semantics of structures, relations and dependencies). An algorithm for the automatic recognition of structural patterns is then presented, together with an evaluation of the results of a test performed on a set of more than 1100 documents from eight very different vocabularies. This language-independent analysis confirms the ability of patterns to capture and summarize the guidelines used by the authors in their everyday practice. Finally, I present some systems that work directly on the pattern-based representation of documents. The ability of these tools to cover very different situations and contexts confirms the effectiveness of the model. Alma Mater Studiorum - Università di Bologna Ciancarini, Paolo 2015-06-04 Doctoral Thesis PeerReviewed application/pdf en http://amsdottorato.unibo.it/7123/ info:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
en |
format |
Doctoral Thesis |
sources |
NDLTD |
topic |
INF/01 Informatica |
spellingShingle |
INF/01 Informatica Poggi, Francesco <1982> Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory |
description |
This thesis aims at investigating a new approach to document analysis based on the idea of structural patterns in XML vocabularies. My work is founded on the belief that authors do naturally converge to a reasonable use of markup languages and that extreme, yet valid instances are rare and limited. Actual documents, therefore, may be used to derive classes of elements (patterns) persisting across documents and distilling the conceptualization of the documents and their components, and may give ground for automatic tools and services that rely on no background information (such as schemas) at all.
The central part of my work consists in introducing from the ground up a formal theory of eight structural patterns (with three sub-patterns) that are able to express the logical organization of any XML document, and verifying their identifiability in a number of different vocabularies. This model is characterized by and validated against three main dimensions: terseness (i.e. the ability to represent the structure of a document with a small number of objects and composition rules), coverage (i.e. the ability to capture any possible situation in any document) and expressiveness (i.e. the ability to make explicit the semantics of structures, relations and dependencies).
An algorithm for the automatic recognition of structural patterns is then presented, together with an evaluation of the results of a test performed on a set of more than 1100 documents from eight very different vocabularies. This language-independent analysis confirms the ability of patterns to capture and summarize the guidelines used by the authors in their everyday practice.
Finally, I present some systems that work directly on the pattern-based representation of documents. The ability of these tools to cover very different situations and contexts confirms the effectiveness of the model. |
author2 |
Ciancarini, Paolo |
author_facet |
Ciancarini, Paolo Poggi, Francesco <1982> |
author |
Poggi, Francesco <1982> |
author_sort |
Poggi, Francesco <1982> |
title |
Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory |
title_short |
Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory |
title_full |
Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory |
title_fullStr |
Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory |
title_full_unstemmed |
Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory |
title_sort |
structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory |
publisher |
Alma Mater Studiorum - Università di Bologna |
publishDate |
2015 |
url |
http://amsdottorato.unibo.it/7123/ |
work_keys_str_mv |
AT poggifrancesco1982 structuralpatternsfordocumentengineeringfromanempiricalbottomupanalysistoanontologicaltheory |
_version_ |
1716816162714550272 |