Summary: | 碩士 === 元智大學 === 資訊管理學系 === 95 === The IC equipment manufacturing industry, along with the development of Taiwan''s IC manufacturing industry, has gone through the low-end products to the current high-end, high-precision product stages. During the past 40 years, the IC equipment makers have accumulated a lot of documents which are not well classified and therefore are not easy to do a search. Until recently, the e-business trend has pushed IC equipment makers to digitalize and manually classified these valuable documents.
The manual classification process is slow and tedious. Thus this study proposes a vector space model based method to automatically classify enterprise documents. The proposed method combines several weight factors including term frequency, term''s uniformity and document special features to boost classification performance.
The experimental results showed that using vector space model (VSM) alone can reach 68.93% of accuracy. Then with additional term''s uniformity to adjust term''s class weight, the accuracy enhances to 76.42%. Finally, with the addition of document unique features, the accuracy promotes to 86.62%. The experimental results confirmed that the combination of several weight factors leads to the improvement of classification performance.
|