Position-Weighted Measures for the Company Name-Matching Problem

碩士 === 國立臺灣大學 === 經濟學研究所 === 104 === This thesis focuses on the company name-matching problem. We analyze common errors and complications in company names committed by users that make the company name-matching problem difficult. Although the company name-matching problem is a type of name-matching p...

Full description

Bibliographic Details
Main Authors: Ching-Kuo Li, 李清國
Other Authors: Yusen Sung
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/26453571913738870132
id ndltd-TW-104NTU05389048
record_format oai_dc
spelling ndltd-TW-104NTU053890482017-04-29T04:31:56Z http://ndltd.ncl.edu.tw/handle/26453571913738870132 Position-Weighted Measures for the Company Name-Matching Problem 位置權重法在公司名匹配上的應用 Ching-Kuo Li 李清國 碩士 國立臺灣大學 經濟學研究所 104 This thesis focuses on the company name-matching problem. We analyze common errors and complications in company names committed by users that make the company name-matching problem difficult. Although the company name-matching problem is a type of name-matching problem, it has special features that make these common name-matching methods barely the best choice in the company name-matching problem. Therefore, according to the construction of the company name, we propose a novel idea of position weight to address company name-matching problem. Then, we compare our proposed position-weighted measure with the Monge-Elkan measure and the soft TF/IDF in the popular business data set and two data sets from a major semiconductors manufacturer. The result indicates that the position-weighted measure performs best overall based on maximum F1 and our proposed rating measure in the company name-matching problem. Besides the company name, the position weighted measure can also be used in some name-matching problems that have similar construction with the company name. Yusen Sung Yuh-Dauh Lyuu 宋玉生 呂育道 2016 學位論文 ; thesis 32 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 經濟學研究所 === 104 === This thesis focuses on the company name-matching problem. We analyze common errors and complications in company names committed by users that make the company name-matching problem difficult. Although the company name-matching problem is a type of name-matching problem, it has special features that make these common name-matching methods barely the best choice in the company name-matching problem. Therefore, according to the construction of the company name, we propose a novel idea of position weight to address company name-matching problem. Then, we compare our proposed position-weighted measure with the Monge-Elkan measure and the soft TF/IDF in the popular business data set and two data sets from a major semiconductors manufacturer. The result indicates that the position-weighted measure performs best overall based on maximum F1 and our proposed rating measure in the company name-matching problem. Besides the company name, the position weighted measure can also be used in some name-matching problems that have similar construction with the company name.
author2 Yusen Sung
author_facet Yusen Sung
Ching-Kuo Li
李清國
author Ching-Kuo Li
李清國
spellingShingle Ching-Kuo Li
李清國
Position-Weighted Measures for the Company Name-Matching Problem
author_sort Ching-Kuo Li
title Position-Weighted Measures for the Company Name-Matching Problem
title_short Position-Weighted Measures for the Company Name-Matching Problem
title_full Position-Weighted Measures for the Company Name-Matching Problem
title_fullStr Position-Weighted Measures for the Company Name-Matching Problem
title_full_unstemmed Position-Weighted Measures for the Company Name-Matching Problem
title_sort position-weighted measures for the company name-matching problem
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/26453571913738870132
work_keys_str_mv AT chingkuoli positionweightedmeasuresforthecompanynamematchingproblem
AT lǐqīngguó positionweightedmeasuresforthecompanynamematchingproblem
AT chingkuoli wèizhìquánzhòngfǎzàigōngsīmíngpǐpèishàngdeyīngyòng
AT lǐqīngguó wèizhìquánzhòngfǎzàigōngsīmíngpǐpèishàngdeyīngyòng
_version_ 1718445693469720576