A Semantic Graph Model for Text Representation and Matching in Document Mining

The explosive growth in the number of documents produced daily necessitates the development of effective alternatives to explore, analyze, and discover knowledge from documents. Document mining research work has emerged to devise automated means to discover and analyze useful information from doc...

Full description

Bibliographic Details
Main Author: Shaban, Khaled
Format: Others
Language:en
Published: University of Waterloo 2007
Subjects:
Online Access:http://hdl.handle.net/10012/2860
id ndltd-WATERLOO-oai-uwspace.uwaterloo.ca-10012-2860
record_format oai_dc
spelling ndltd-WATERLOO-oai-uwspace.uwaterloo.ca-10012-28602013-01-08T18:50:04ZShaban, Khaled2007-05-08T13:47:02Z2007-05-08T13:47:02Z20062006http://hdl.handle.net/10012/2860The explosive growth in the number of documents produced daily necessitates the development of effective alternatives to explore, analyze, and discover knowledge from documents. Document mining research work has emerged to devise automated means to discover and analyze useful information from documents. This work has been mainly concerned with constructing text representation models, developing distance measures to estimate similarities between documents, and utilizing that in mining processes such as document clustering, document classification, information retrieval, information filtering, and information extraction. <br /><br /> Conventional text representation methodologies consider documents as bags of words and ignore the meanings and ideas their authors want to convey. It is this deficiency that causes similarity measures to fail to perceive contextual similarity of text passages due to the variation of the words the passages contain, or at least perceive contextually dissimilar text passages as being similar because of the resemblance of words the passages have. <br /><br /> This thesis presents a new paradigm for mining documents by exploiting semantic information of their texts. A formal semantic representation of linguistic inputs is introduced and utilized to build a semantic representation scheme for documents. The representation scheme is constructed through accumulation of syntactic and semantic analysis outputs. A new distance measure is developed to determine the similarities between contents of documents. The measure is based on inexact matching of attributed trees. It involves the computation of all distinct similarity common sub-trees, and can be computed efficiently. It is believed that the proposed representation scheme along with the proposed similarity measure will enable more effective document mining processes. <br /><br /> The proposed techniques to mine documents were implemented as vital components in a mining system. A case study of semantic document clustering is presented to demonstrate the working and the efficacy of the framework. Experimental work is reported, and its results are presented and analyzed.application/pdf1461362 bytesapplication/pdfenUniversity of WaterlooCopyright: 2006, Shaban, Khaled. All rights reserved.Electrical & Computer EngineeringDocument miningsemantic understandingtext representationsimilarity measuredocument clustering.A Semantic Graph Model for Text Representation and Matching in Document MiningThesis or DissertationElectrical and Computer EngineeringDoctor of Philosophy
collection NDLTD
language en
format Others
sources NDLTD
topic Electrical & Computer Engineering
Document mining
semantic understanding
text representation
similarity measure
document clustering.
spellingShingle Electrical & Computer Engineering
Document mining
semantic understanding
text representation
similarity measure
document clustering.
Shaban, Khaled
A Semantic Graph Model for Text Representation and Matching in Document Mining
description The explosive growth in the number of documents produced daily necessitates the development of effective alternatives to explore, analyze, and discover knowledge from documents. Document mining research work has emerged to devise automated means to discover and analyze useful information from documents. This work has been mainly concerned with constructing text representation models, developing distance measures to estimate similarities between documents, and utilizing that in mining processes such as document clustering, document classification, information retrieval, information filtering, and information extraction. <br /><br /> Conventional text representation methodologies consider documents as bags of words and ignore the meanings and ideas their authors want to convey. It is this deficiency that causes similarity measures to fail to perceive contextual similarity of text passages due to the variation of the words the passages contain, or at least perceive contextually dissimilar text passages as being similar because of the resemblance of words the passages have. <br /><br /> This thesis presents a new paradigm for mining documents by exploiting semantic information of their texts. A formal semantic representation of linguistic inputs is introduced and utilized to build a semantic representation scheme for documents. The representation scheme is constructed through accumulation of syntactic and semantic analysis outputs. A new distance measure is developed to determine the similarities between contents of documents. The measure is based on inexact matching of attributed trees. It involves the computation of all distinct similarity common sub-trees, and can be computed efficiently. It is believed that the proposed representation scheme along with the proposed similarity measure will enable more effective document mining processes. <br /><br /> The proposed techniques to mine documents were implemented as vital components in a mining system. A case study of semantic document clustering is presented to demonstrate the working and the efficacy of the framework. Experimental work is reported, and its results are presented and analyzed.
author Shaban, Khaled
author_facet Shaban, Khaled
author_sort Shaban, Khaled
title A Semantic Graph Model for Text Representation and Matching in Document Mining
title_short A Semantic Graph Model for Text Representation and Matching in Document Mining
title_full A Semantic Graph Model for Text Representation and Matching in Document Mining
title_fullStr A Semantic Graph Model for Text Representation and Matching in Document Mining
title_full_unstemmed A Semantic Graph Model for Text Representation and Matching in Document Mining
title_sort semantic graph model for text representation and matching in document mining
publisher University of Waterloo
publishDate 2007
url http://hdl.handle.net/10012/2860
work_keys_str_mv AT shabankhaled asemanticgraphmodelfortextrepresentationandmatchingindocumentmining
AT shabankhaled semanticgraphmodelfortextrepresentationandmatchingindocumentmining
_version_ 1716572869821988864