A Semantic Graph Model for Text Representation and Matching in Document Mining

The explosive growth in the number of documents produced daily necessitates the development of effective alternatives to explore, analyze, and discover knowledge from documents. Document mining research work has emerged to devise automated means to discover and analyze useful information from doc...

Full description

Bibliographic Details
Main Author:	Shaban, Khaled
Format:	Others
Language:	en
Published:	University of Waterloo 2007
Subjects:	Electrical & Computer Engineering Document mining semantic understanding text representation similarity measure document clustering.
Online Access:	http://hdl.handle.net/10012/2860

id	ndltd-WATERLOO-oai-uwspace.uwaterloo.ca-10012-2860
record_format	oai_dc
spelling	ndltd-WATERLOO-oai-uwspace.uwaterloo.ca-10012-28602013-01-08T18:50:04ZShaban, Khaled2007-05-08T13:47:02Z2007-05-08T13:47:02Z20062006http://hdl.handle.net/10012/2860The explosive growth in the number of documents produced daily necessitates the development of effective alternatives to explore, analyze, and discover knowledge from documents. Document mining research work has emerged to devise automated means to discover and analyze useful information from documents. This work has been mainly concerned with constructing text representation models, developing distance measures to estimate similarities between documents, and utilizing that in mining processes such as document clustering, document classification, information retrieval, information filtering, and information extraction. <br /><br /> Conventional text representation methodologies consider documents as bags of words and ignore the meanings and ideas their authors want to convey. It is this deficiency that causes similarity measures to fail to perceive contextual similarity of text passages due to the variation of the words the passages contain, or at least perceive contextually dissimilar text passages as being similar because of the resemblance of words the passages have. <br /><br /> This thesis presents a new paradigm for mining documents by exploiting semantic information of their texts. A formal semantic representation of linguistic inputs is introduced and utilized to build a semantic representation scheme for documents. The representation scheme is constructed through accumulation of syntactic and semantic analysis outputs. A new distance measure is developed to determine the similarities between contents of documents. The measure is based on inexact matching of attributed trees. It involves the computation of all distinct similarity common sub-trees, and can be computed efficiently. It is believed that the proposed representation scheme along with the proposed similarity measure will enable more effective document mining processes. <br /><br /> The proposed techniques to mine documents were implemented as vital components in a mining system. A case study of semantic document clustering is presented to demonstrate the working and the efficacy of the framework. Experimental work is reported, and its results are presented and analyzed.application/pdf1461362 bytesapplication/pdfenUniversity of WaterlooCopyright: 2006, Shaban, Khaled. All rights reserved.Electrical & Computer EngineeringDocument miningsemantic understandingtext representationsimilarity measuredocument clustering.A Semantic Graph Model for Text Representation and Matching in Document MiningThesis or DissertationElectrical and Computer EngineeringDoctor of Philosophy
collection	NDLTD
language	en
format	Others
sources	NDLTD
topic	Electrical & Computer Engineering Document mining semantic understanding text representation similarity measure document clustering.
spellingShingle	Electrical & Computer Engineering Document mining semantic understanding text representation similarity measure document clustering. Shaban, Khaled A Semantic Graph Model for Text Representation and Matching in Document Mining
description	The explosive growth in the number of documents produced daily necessitates the development of effective alternatives to explore, analyze, and discover knowledge from documents. Document mining research work has emerged to devise automated means to discover and analyze useful information from documents. This work has been mainly concerned with constructing text representation models, developing distance measures to estimate similarities between documents, and utilizing that in mining processes such as document clustering, document classification, information retrieval, information filtering, and information extraction. <br /><br /> Conventional text representation methodologies consider documents as bags of words and ignore the meanings and ideas their authors want to convey. It is this deficiency that causes similarity measures to fail to perceive contextual similarity of text passages due to the variation of the words the passages contain, or at least perceive contextually dissimilar text passages as being similar because of the resemblance of words the passages have. <br /><br /> This thesis presents a new paradigm for mining documents by exploiting semantic information of their texts. A formal semantic representation of linguistic inputs is introduced and utilized to build a semantic representation scheme for documents. The representation scheme is constructed through accumulation of syntactic and semantic analysis outputs. A new distance measure is developed to determine the similarities between contents of documents. The measure is based on inexact matching of attributed trees. It involves the computation of all distinct similarity common sub-trees, and can be computed efficiently. It is believed that the proposed representation scheme along with the proposed similarity measure will enable more effective document mining processes. <br /><br /> The proposed techniques to mine documents were implemented as vital components in a mining system. A case study of semantic document clustering is presented to demonstrate the working and the efficacy of the framework. Experimental work is reported, and its results are presented and analyzed.
author	Shaban, Khaled
author_facet	Shaban, Khaled
author_sort	Shaban, Khaled
title	A Semantic Graph Model for Text Representation and Matching in Document Mining
title_short	A Semantic Graph Model for Text Representation and Matching in Document Mining
title_full	A Semantic Graph Model for Text Representation and Matching in Document Mining
title_fullStr	A Semantic Graph Model for Text Representation and Matching in Document Mining
title_full_unstemmed	A Semantic Graph Model for Text Representation and Matching in Document Mining
title_sort	semantic graph model for text representation and matching in document mining
publisher	University of Waterloo
publishDate	2007
url	http://hdl.handle.net/10012/2860
work_keys_str_mv	AT shabankhaled asemanticgraphmodelfortextrepresentationandmatchingindocumentmining AT shabankhaled semanticgraphmodelfortextrepresentationandmatchingindocumentmining
_version_	1716572869821988864

A Semantic Graph Model for Text Representation and Matching in Document Mining

Similar Items