Information Extraction on the Web Site

碩士 === 國立中正大學 === 資訊工程研究所 === 89 === World Wide Web provides a good source of large scale of information. It is good for us to use this source that is wide and on time to get useful information. Automatic Dictionary Generator is usually constructed by sentence segmenting system. Our syste...

Full description

Bibliographic Details
Main Authors: Leu Shyang-Rong, 呂祥榮
Other Authors: Sun Wu
Format: Others
Language:zh-TW
Published: 2001
Online Access:http://ndltd.ncl.edu.tw/handle/20167244811119068848
id ndltd-TW-089CCU00392093
record_format oai_dc
spelling ndltd-TW-089CCU003920932016-07-06T04:09:53Z http://ndltd.ncl.edu.tw/handle/20167244811119068848 Information Extraction on the Web Site 網頁資源資訊擷取 Leu Shyang-Rong 呂祥榮 碩士 國立中正大學 資訊工程研究所 89 World Wide Web provides a good source of large scale of information. It is good for us to use this source that is wide and on time to get useful information. Automatic Dictionary Generator is usually constructed by sentence segmenting system. Our system uses the symbols to decide the way of phrase generating. We divide the symbols into four kinds by the effect of the affection of symbols to a sentence and by this way we get the phrase data of first step. The copies of documents make a noise to the times of phrase. Copy detection is often the way to solve the problem. It is a good way to remove the similar documents. We use the way of “Checking the Sequence that phrase is created” to remove the noise of the copies of documents. One of the useful data on web is the communication data that many people provided. To collect this kind of data to convenient for users to search the information of the people they want to know. But however the formal data form of the communication data in Chinese is not available. It is difficult to collect this kind of data automatically. We use the frequent director strings to help us to get some of them. Sun Wu 吳昇 2001 學位論文 ; thesis 30 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中正大學 === 資訊工程研究所 === 89 === World Wide Web provides a good source of large scale of information. It is good for us to use this source that is wide and on time to get useful information. Automatic Dictionary Generator is usually constructed by sentence segmenting system. Our system uses the symbols to decide the way of phrase generating. We divide the symbols into four kinds by the effect of the affection of symbols to a sentence and by this way we get the phrase data of first step. The copies of documents make a noise to the times of phrase. Copy detection is often the way to solve the problem. It is a good way to remove the similar documents. We use the way of “Checking the Sequence that phrase is created” to remove the noise of the copies of documents. One of the useful data on web is the communication data that many people provided. To collect this kind of data to convenient for users to search the information of the people they want to know. But however the formal data form of the communication data in Chinese is not available. It is difficult to collect this kind of data automatically. We use the frequent director strings to help us to get some of them.
author2 Sun Wu
author_facet Sun Wu
Leu Shyang-Rong
呂祥榮
author Leu Shyang-Rong
呂祥榮
spellingShingle Leu Shyang-Rong
呂祥榮
Information Extraction on the Web Site
author_sort Leu Shyang-Rong
title Information Extraction on the Web Site
title_short Information Extraction on the Web Site
title_full Information Extraction on the Web Site
title_fullStr Information Extraction on the Web Site
title_full_unstemmed Information Extraction on the Web Site
title_sort information extraction on the web site
publishDate 2001
url http://ndltd.ncl.edu.tw/handle/20167244811119068848
work_keys_str_mv AT leushyangrong informationextractiononthewebsite
AT lǚxiángróng informationextractiononthewebsite
AT leushyangrong wǎngyèzīyuánzīxùnxiéqǔ
AT lǚxiángróng wǎngyèzīyuánzīxùnxiéqǔ
_version_ 1718336467251494912