Information Technology for Historical Document Analysis

博士 === 國立臺灣大學 === 資訊工程學研究所 === 99 === This thesis proposes two IT methods to help historians utilize digitized historical documents. The availability of large quantity of historical documents that can be searched and retrieved has become a challenge for historians since the traditional way of carefu...

Full description

Bibliographic Details
Main Authors: Shih-Pei Chen, 陳詩沛
Other Authors: 項潔
Format: Others
Language:zh-TW
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/37182004994984890555
id ndltd-TW-099NTU05392123
record_format oai_dc
spelling ndltd-TW-099NTU053921232015-10-16T04:03:27Z http://ndltd.ncl.edu.tw/handle/37182004994984890555 Information Technology for Historical Document Analysis 資訊技術與歷史文獻分析 Shih-Pei Chen 陳詩沛 博士 國立臺灣大學 資訊工程學研究所 99 This thesis proposes two IT methods to help historians utilize digitized historical documents. The availability of large quantity of historical documents that can be searched and retrieved has become a challenge for historians since the traditional way of carefully going through a small number of documents is no longer sufficient. In this thesis we first give an overview of THDL, the Taiwan History Digital Library, a full-text digital library of primary historical documents about Taiwan. The documents in THDL, currently numbered 73,287 documents and over 54,000,000 words, are the major experiment materials in this thesis. We then introduce the feature analysis method, which puts a collection of historical documents in an observation environment to be studied collectively as opposed to treating them as individual documents. Feature analysis takes a sub-collection, meaning a set of documents related to a research topic that the user is currently interested in, as its input and analyzes the features shared by these documents. By calculating the amount of support for each feature (the amount of documents which are evidences of the occurrence of a feature), this method discovers features that are highly related to a sub-collection. We have developed a mathematical model for this method. We have also applied it to two of the corpuses in THDL and found unexpected and interesting observations. We then present several relation discovery methods that try to find relationships among historical documents in a large collection of documents. We gave three examples of relation discovery carried out on the Imperial Court documents and Taiwanese land deeds. They are citation relations, land transaction relations, and the template relation. Through our methods, we have discovered 6,802 citation relations among the 37,836 Imperial Court documents selected from 280 sources, 3,910 transaction relations among the 35,451 land deeds from 117 sources, and 105 templates that were created following a specific format. We argued that the relationship discovery not only can help historians to consider more angles while reading the documents, but also can lead to new findings. The citation relations found have been transformed into 1,101 successive citation graphs, each of which reveals how a historical event evolved through the correspondence between a Qing emperor and his officials. The transaction relations are also transformed into 2,219 land transitivity graphs, some of which indicates land development activities that have never been studied before. 項潔 2011 學位論文 ; thesis 167 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 博士 === 國立臺灣大學 === 資訊工程學研究所 === 99 === This thesis proposes two IT methods to help historians utilize digitized historical documents. The availability of large quantity of historical documents that can be searched and retrieved has become a challenge for historians since the traditional way of carefully going through a small number of documents is no longer sufficient. In this thesis we first give an overview of THDL, the Taiwan History Digital Library, a full-text digital library of primary historical documents about Taiwan. The documents in THDL, currently numbered 73,287 documents and over 54,000,000 words, are the major experiment materials in this thesis. We then introduce the feature analysis method, which puts a collection of historical documents in an observation environment to be studied collectively as opposed to treating them as individual documents. Feature analysis takes a sub-collection, meaning a set of documents related to a research topic that the user is currently interested in, as its input and analyzes the features shared by these documents. By calculating the amount of support for each feature (the amount of documents which are evidences of the occurrence of a feature), this method discovers features that are highly related to a sub-collection. We have developed a mathematical model for this method. We have also applied it to two of the corpuses in THDL and found unexpected and interesting observations. We then present several relation discovery methods that try to find relationships among historical documents in a large collection of documents. We gave three examples of relation discovery carried out on the Imperial Court documents and Taiwanese land deeds. They are citation relations, land transaction relations, and the template relation. Through our methods, we have discovered 6,802 citation relations among the 37,836 Imperial Court documents selected from 280 sources, 3,910 transaction relations among the 35,451 land deeds from 117 sources, and 105 templates that were created following a specific format. We argued that the relationship discovery not only can help historians to consider more angles while reading the documents, but also can lead to new findings. The citation relations found have been transformed into 1,101 successive citation graphs, each of which reveals how a historical event evolved through the correspondence between a Qing emperor and his officials. The transaction relations are also transformed into 2,219 land transitivity graphs, some of which indicates land development activities that have never been studied before.
author2 項潔
author_facet 項潔
Shih-Pei Chen
陳詩沛
author Shih-Pei Chen
陳詩沛
spellingShingle Shih-Pei Chen
陳詩沛
Information Technology for Historical Document Analysis
author_sort Shih-Pei Chen
title Information Technology for Historical Document Analysis
title_short Information Technology for Historical Document Analysis
title_full Information Technology for Historical Document Analysis
title_fullStr Information Technology for Historical Document Analysis
title_full_unstemmed Information Technology for Historical Document Analysis
title_sort information technology for historical document analysis
publishDate 2011
url http://ndltd.ncl.edu.tw/handle/37182004994984890555
work_keys_str_mv AT shihpeichen informationtechnologyforhistoricaldocumentanalysis
AT chénshīpèi informationtechnologyforhistoricaldocumentanalysis
AT shihpeichen zīxùnjìshùyǔlìshǐwénxiànfēnxī
AT chénshīpèi zīxùnjìshùyǔlìshǐwénxiànfēnxī
_version_ 1718092922946060288