Information Extraction on the Web Site

碩士 === 國立中正大學 === 資訊工程研究所 === 89 === World Wide Web provides a good source of large scale of information. It is good for us to use this source that is wide and on time to get useful information. Automatic Dictionary Generator is usually constructed by sentence segmenting system. Our syste...

Full description

Bibliographic Details
Main Authors:	Leu Shyang-Rong, 呂祥榮
Other Authors:	Sun Wu
Format:	Others
Language:	zh-TW
Published:	2001
Online Access:	http://ndltd.ncl.edu.tw/handle/20167244811119068848

id	ndltd-TW-089CCU00392093
record_format	oai_dc
spelling	ndltd-TW-089CCU003920932016-07-06T04:09:53Z http://ndltd.ncl.edu.tw/handle/20167244811119068848 Information Extraction on the Web Site 網頁資源資訊擷取 Leu Shyang-Rong 呂祥榮碩士國立中正大學資訊工程研究所 89 World Wide Web provides a good source of large scale of information. It is good for us to use this source that is wide and on time to get useful information. Automatic Dictionary Generator is usually constructed by sentence segmenting system. Our system uses the symbols to decide the way of phrase generating. We divide the symbols into four kinds by the effect of the affection of symbols to a sentence and by this way we get the phrase data of first step. The copies of documents make a noise to the times of phrase. Copy detection is often the way to solve the problem. It is a good way to remove the similar documents. We use the way of “Checking the Sequence that phrase is created” to remove the noise of the copies of documents. One of the useful data on web is the communication data that many people provided. To collect this kind of data to convenient for users to search the information of the people they want to know. But however the formal data form of the communication data in Chinese is not available. It is difficult to collect this kind of data automatically. We use the frequent director strings to help us to get some of them. Sun Wu 吳昇 2001 學位論文 ; thesis 30 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立中正大學 === 資訊工程研究所 === 89 === World Wide Web provides a good source of large scale of information. It is good for us to use this source that is wide and on time to get useful information. Automatic Dictionary Generator is usually constructed by sentence segmenting system. Our system uses the symbols to decide the way of phrase generating. We divide the symbols into four kinds by the effect of the affection of symbols to a sentence and by this way we get the phrase data of first step. The copies of documents make a noise to the times of phrase. Copy detection is often the way to solve the problem. It is a good way to remove the similar documents. We use the way of “Checking the Sequence that phrase is created” to remove the noise of the copies of documents. One of the useful data on web is the communication data that many people provided. To collect this kind of data to convenient for users to search the information of the people they want to know. But however the formal data form of the communication data in Chinese is not available. It is difficult to collect this kind of data automatically. We use the frequent director strings to help us to get some of them.
author2	Sun Wu
author_facet	Sun Wu Leu Shyang-Rong 呂祥榮
author	Leu Shyang-Rong 呂祥榮
spellingShingle	Leu Shyang-Rong 呂祥榮 Information Extraction on the Web Site
author_sort	Leu Shyang-Rong
title	Information Extraction on the Web Site
title_short	Information Extraction on the Web Site
title_full	Information Extraction on the Web Site
title_fullStr	Information Extraction on the Web Site
title_full_unstemmed	Information Extraction on the Web Site
title_sort	information extraction on the web site
publishDate	2001
url	http://ndltd.ncl.edu.tw/handle/20167244811119068848
work_keys_str_mv	AT leushyangrong informationextractiononthewebsite AT lǚxiángróng informationextractiononthewebsite AT leushyangrong wǎngyèzīyuánzīxùnxiéqǔ AT lǚxiángróng wǎngyèzīyuánzīxùnxiéqǔ
_version_	1718336467251494912

Information Extraction on the Web Site

Similar Items