Information Extraction on the Web Site
碩士 === 國立中正大學 === 資訊工程研究所 === 89 === World Wide Web provides a good source of large scale of information. It is good for us to use this source that is wide and on time to get useful information. Automatic Dictionary Generator is usually constructed by sentence segmenting system. Our syste...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2001
|
Online Access: | http://ndltd.ncl.edu.tw/handle/20167244811119068848 |
id |
ndltd-TW-089CCU00392093 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-089CCU003920932016-07-06T04:09:53Z http://ndltd.ncl.edu.tw/handle/20167244811119068848 Information Extraction on the Web Site 網頁資源資訊擷取 Leu Shyang-Rong 呂祥榮 碩士 國立中正大學 資訊工程研究所 89 World Wide Web provides a good source of large scale of information. It is good for us to use this source that is wide and on time to get useful information. Automatic Dictionary Generator is usually constructed by sentence segmenting system. Our system uses the symbols to decide the way of phrase generating. We divide the symbols into four kinds by the effect of the affection of symbols to a sentence and by this way we get the phrase data of first step. The copies of documents make a noise to the times of phrase. Copy detection is often the way to solve the problem. It is a good way to remove the similar documents. We use the way of “Checking the Sequence that phrase is created” to remove the noise of the copies of documents. One of the useful data on web is the communication data that many people provided. To collect this kind of data to convenient for users to search the information of the people they want to know. But however the formal data form of the communication data in Chinese is not available. It is difficult to collect this kind of data automatically. We use the frequent director strings to help us to get some of them. Sun Wu 吳昇 2001 學位論文 ; thesis 30 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中正大學 === 資訊工程研究所 === 89 === World Wide Web provides a good source of large scale of information. It is good for us to use this source that is wide and on time to get useful information.
Automatic Dictionary Generator is usually constructed by sentence segmenting system. Our system uses the symbols to decide the way of phrase generating. We divide the symbols into four kinds by the effect of the affection of symbols to a sentence and by this way we get the phrase data of first step.
The copies of documents make a noise to the times of phrase. Copy detection is often the way to solve the problem. It is a good way to remove the similar documents. We use the way of “Checking the Sequence that phrase is created” to remove the noise of the copies of documents.
One of the useful data on web is the communication data that many people provided. To collect this kind of data to convenient for users to search the information of the people they want to know. But however the formal data form of the communication data in Chinese is not available. It is difficult to collect this kind of data automatically. We use the frequent director strings to help us to get some of them.
|
author2 |
Sun Wu |
author_facet |
Sun Wu Leu Shyang-Rong 呂祥榮 |
author |
Leu Shyang-Rong 呂祥榮 |
spellingShingle |
Leu Shyang-Rong 呂祥榮 Information Extraction on the Web Site |
author_sort |
Leu Shyang-Rong |
title |
Information Extraction on the Web Site |
title_short |
Information Extraction on the Web Site |
title_full |
Information Extraction on the Web Site |
title_fullStr |
Information Extraction on the Web Site |
title_full_unstemmed |
Information Extraction on the Web Site |
title_sort |
information extraction on the web site |
publishDate |
2001 |
url |
http://ndltd.ncl.edu.tw/handle/20167244811119068848 |
work_keys_str_mv |
AT leushyangrong informationextractiononthewebsite AT lǚxiángróng informationextractiononthewebsite AT leushyangrong wǎngyèzīyuánzīxùnxiéqǔ AT lǚxiángróng wǎngyèzīyuánzīxùnxiéqǔ |
_version_ |
1718336467251494912 |