Unsupervised discovery of relations for analysis of textual data in digital forensics

This dissertation addresses the problem of analysing digital data in digital forensics. It will be shown that text mining methods can be adapted and applied to digital forensics to aid analysts to more quickly, efficiently and accurately analyse data to reveal truly useful information. Investigators...

Full description

Bibliographic Details
Main Author: Louis, Anita Lily
Other Authors: Engelbrecht, Andries P.
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/2263/27479
Louis, AL 2009, Unsupervised discovery of relations for analysis of textual data in digital forensics, MSc dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://hdl.handle.net/2263/27479 >
http://upetd.up.ac.za/thesis/available/etd-08232010-193559/
id ndltd-netd.ac.za-oai-union.ndltd.org-up-oai-repository.up.ac.za-2263-27479
record_format oai_dc
spelling ndltd-netd.ac.za-oai-union.ndltd.org-up-oai-repository.up.ac.za-2263-274792017-07-20T04:11:20Z Unsupervised discovery of relations for analysis of textual data in digital forensics Louis, Anita Lily Engelbrecht, Andries P. anita.louis@gmail.com Text analysis Text mining Information extraction Relation discovery Digital forensics UCTD This dissertation addresses the problem of analysing digital data in digital forensics. It will be shown that text mining methods can be adapted and applied to digital forensics to aid analysts to more quickly, efficiently and accurately analyse data to reveal truly useful information. Investigators who wish to utilise digital evidence must examine and organise the data to piece together events and facts of a crime. The difficulty with finding relevant information quickly using the current tools and methods is that these tools rely very heavily on background knowledge for query terms and do not fully utilise the content of the data. A novel framework in which to perform evidence discovery is proposed in order to reduce the quantity of data to be analysed, aid the analysts' exploration of the data and enhance the intelligibility of the presentation of the data. The framework combines information extraction techniques with visual exploration techniques to provide a novel approach to performing evidence discovery, in the form of an evidence discovery system. By utilising unrestricted, unsupervised information extraction techniques, the investigator does not require input queries or keywords for searching, thus enabling the investigator to analyse portions of the data that may not have been identified by keyword searches. The evidence discovery system produces text graphs of the most important concepts and associations extracted from the full text to establish ties between the concepts and provide an overview and general representation of the text. Through an interactive visual interface the investigator can explore the data to identify suspects, events and the relations between suspects. Two models are proposed for performing the relation extraction process of the evidence discovery framework. The first model takes a statistical approach to discovering relations based on co-occurrences of complex concepts. The second model utilises a linguistic approach using named entity extraction and information extraction patterns. A preliminary study was performed to assess the usefulness of a text mining approach to digital forensics as against the traditional information retrieval approach. It was concluded that the novel approach to text analysis for evidence discovery presented in this dissertation is a viable and promising approach. The preliminary experiment showed that the results obtained from the evidence discovery system, using either of the relation extraction models, are sensible and useful. The approach advocated in this dissertation can therefore be successfully applied to the analysis of textual data for digital forensics Copyright Dissertation (MSc)--University of Pretoria, 2010. Computer Science unrestricted 2013-09-07T11:38:24Z 2010-08-23 2013-09-07T11:38:24Z 2010-04-12 2010-08-23 2010-08-23 Dissertation http://hdl.handle.net/2263/27479 Louis, AL 2009, Unsupervised discovery of relations for analysis of textual data in digital forensics, MSc dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://hdl.handle.net/2263/27479 > E10/449/gm http://upetd.up.ac.za/thesis/available/etd-08232010-193559/ © 2009, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
collection NDLTD
sources NDLTD
topic Text analysis
Text mining
Information extraction
Relation discovery
Digital forensics
UCTD
spellingShingle Text analysis
Text mining
Information extraction
Relation discovery
Digital forensics
UCTD
Louis, Anita Lily
Unsupervised discovery of relations for analysis of textual data in digital forensics
description This dissertation addresses the problem of analysing digital data in digital forensics. It will be shown that text mining methods can be adapted and applied to digital forensics to aid analysts to more quickly, efficiently and accurately analyse data to reveal truly useful information. Investigators who wish to utilise digital evidence must examine and organise the data to piece together events and facts of a crime. The difficulty with finding relevant information quickly using the current tools and methods is that these tools rely very heavily on background knowledge for query terms and do not fully utilise the content of the data. A novel framework in which to perform evidence discovery is proposed in order to reduce the quantity of data to be analysed, aid the analysts' exploration of the data and enhance the intelligibility of the presentation of the data. The framework combines information extraction techniques with visual exploration techniques to provide a novel approach to performing evidence discovery, in the form of an evidence discovery system. By utilising unrestricted, unsupervised information extraction techniques, the investigator does not require input queries or keywords for searching, thus enabling the investigator to analyse portions of the data that may not have been identified by keyword searches. The evidence discovery system produces text graphs of the most important concepts and associations extracted from the full text to establish ties between the concepts and provide an overview and general representation of the text. Through an interactive visual interface the investigator can explore the data to identify suspects, events and the relations between suspects. Two models are proposed for performing the relation extraction process of the evidence discovery framework. The first model takes a statistical approach to discovering relations based on co-occurrences of complex concepts. The second model utilises a linguistic approach using named entity extraction and information extraction patterns. A preliminary study was performed to assess the usefulness of a text mining approach to digital forensics as against the traditional information retrieval approach. It was concluded that the novel approach to text analysis for evidence discovery presented in this dissertation is a viable and promising approach. The preliminary experiment showed that the results obtained from the evidence discovery system, using either of the relation extraction models, are sensible and useful. The approach advocated in this dissertation can therefore be successfully applied to the analysis of textual data for digital forensics Copyright === Dissertation (MSc)--University of Pretoria, 2010. === Computer Science === unrestricted
author2 Engelbrecht, Andries P.
author_facet Engelbrecht, Andries P.
Louis, Anita Lily
author Louis, Anita Lily
author_sort Louis, Anita Lily
title Unsupervised discovery of relations for analysis of textual data in digital forensics
title_short Unsupervised discovery of relations for analysis of textual data in digital forensics
title_full Unsupervised discovery of relations for analysis of textual data in digital forensics
title_fullStr Unsupervised discovery of relations for analysis of textual data in digital forensics
title_full_unstemmed Unsupervised discovery of relations for analysis of textual data in digital forensics
title_sort unsupervised discovery of relations for analysis of textual data in digital forensics
publishDate 2013
url http://hdl.handle.net/2263/27479
Louis, AL 2009, Unsupervised discovery of relations for analysis of textual data in digital forensics, MSc dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://hdl.handle.net/2263/27479 >
http://upetd.up.ac.za/thesis/available/etd-08232010-193559/
work_keys_str_mv AT louisanitalily unsuperviseddiscoveryofrelationsforanalysisoftextualdataindigitalforensics
_version_ 1718498604723732480