Mining Publication Records on Publication Pages based on Conditional Random Fields

碩士 === 國立臺灣科技大學 === 資訊工程系 === 100 === A publication record is a list of semi-structured citation strings for publications of a research institute or an individual researcher. Publication records are integrated into a digital library which becomes an important knowledge base and thereby enables a var...

Full description

Bibliographic Details
Main Authors:	LIN, YA-HUEI, 林雅惠
Other Authors:	Hahn-Ming Lee
Format:	Others
Language:	en_US
Published:	2012
Online Access:	http://ndltd.ncl.edu.tw/handle/bh8m2e

id	ndltd-TW-100NTUS5392042
record_format	oai_dc
spelling	ndltd-TW-100NTUS53920422019-05-15T20:43:22Z http://ndltd.ncl.edu.tw/handle/bh8m2e Mining Publication Records on Publication Pages based on Conditional Random Fields 基於條件機率域萃取引用文獻資訊於個人著述網頁 LIN, YA-HUEI 林雅惠碩士國立臺灣科技大學資訊工程系 100 A publication record is a list of semi-structured citation strings for publications of a research institute or an individual researcher. Publication records are integrated into a digital library which becomes an important knowledge base and thereby enables a variety of applications. A publication record is usually found among other information on a publication Web page (or ”publication page” for short). It is thus an interesting problem to extract publication record from such Web pages. The problem is difficult for several reasons, e.g., flexibility in formatting the metadata of a publication as a semi-structured citation string and flexibility in expressing the citation string visually presentation in HTML. Furthermore, two citation strings with a similar visual presentation on the same Web page may have different HTML constructs. In this paper, we present a content analysis approach, based on Conditional Random Fields and data region boundary analysis, the problem of automatically extracting publication records on a publication page. Experimental results show that our method performs well on a benchmark containing manually crafted publication pages. The precision rate and recall rate, and F-measure are 82.5%, 87.6%, and 85.0%, respectively. This is a significant improvement over previous researches. Hahn-Ming Lee Jan-Ming Ho 李漢銘何建明 2012 學位論文 ; thesis 47 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立臺灣科技大學 === 資訊工程系 === 100 === A publication record is a list of semi-structured citation strings for publications of a research institute or an individual researcher. Publication records are integrated into a digital library which becomes an important knowledge base and thereby enables a variety of applications. A publication record is usually found among other information on a publication Web page (or ”publication page” for short). It is thus an interesting problem to extract publication record from such Web pages. The problem is difficult for several reasons, e.g., flexibility in formatting the metadata of a publication as a semi-structured citation string and flexibility in expressing the citation string visually presentation in HTML. Furthermore, two citation strings with a similar visual presentation on the same Web page may have different HTML constructs. In this paper, we present a content analysis approach, based on Conditional Random Fields and data region boundary analysis, the problem of automatically extracting publication records on a publication page. Experimental results show that our method performs well on a benchmark containing manually crafted publication pages. The precision rate and recall rate, and F-measure are 82.5%, 87.6%, and 85.0%, respectively. This is a significant improvement over previous researches.
author2	Hahn-Ming Lee
author_facet	Hahn-Ming Lee LIN, YA-HUEI 林雅惠
author	LIN, YA-HUEI 林雅惠
spellingShingle	LIN, YA-HUEI 林雅惠 Mining Publication Records on Publication Pages based on Conditional Random Fields
author_sort	LIN, YA-HUEI
title	Mining Publication Records on Publication Pages based on Conditional Random Fields
title_short	Mining Publication Records on Publication Pages based on Conditional Random Fields
title_full	Mining Publication Records on Publication Pages based on Conditional Random Fields
title_fullStr	Mining Publication Records on Publication Pages based on Conditional Random Fields
title_full_unstemmed	Mining Publication Records on Publication Pages based on Conditional Random Fields
title_sort	mining publication records on publication pages based on conditional random fields
publishDate	2012
url	http://ndltd.ncl.edu.tw/handle/bh8m2e
work_keys_str_mv	AT linyahuei miningpublicationrecordsonpublicationpagesbasedonconditionalrandomfields AT línyǎhuì miningpublicationrecordsonpublicationpagesbasedonconditionalrandomfields AT linyahuei jīyútiáojiànjīlǜyùcuìqǔyǐnyòngwénxiànzīxùnyúgèrénzheshùwǎngyè AT línyǎhuì jīyútiáojiànjīlǜyùcuìqǔyǐnyòngwénxiànzīxùnyúgèrénzheshùwǎngyè
_version_	1719104641714618368

Mining Publication Records on Publication Pages based on Conditional Random Fields

Similar Items