Summary: | 博士 === 國立中央大學 === 資訊工程研究所 === 88 === ABSTRACT
There are two kinds of off-line Chinese character recognition systems: one is based on statistic features, and the other is based on structure features. In this dissertation, we focus on the corresponding subjects of the structure-feature based off-line Chinese character recognition system.
A structure-feature based Chinese character recognition system is usually composed of four main modules: preprocessing, stroke extraction, coarse classification and recognition. In the preprocessing module, the scanned image is denoised and skeletonized to facilitate the task of stroke extraction. In this stage, we propose a novel run-length-based skeletonization approach that is more tolerant to noise. The generated skeleton includes no fork point. The special forkless skeleton facilitates and simplifies the task of stroke extraction and makes the result of stroke extraction more reliable.
Some structure features can be found for each stroke after the strokes embedded in the character having been extracted, including the end points, the center point, the orientation and the length of the stroke. Further more, some relationships between two strokes can also be found, including the fork points, the distance, the orientation difference, and the length ratio between the two strokes. These extracted features will be utilized in the following steps of recognizing characters.
Since Chinese character contains a huge number of characters, it is inefficient to match input character with all the characters in database. Therefore, to preclassify all the characters is necessary. In this dissertation, we also propose an effective preclassification scheme to divide the whole character set into subclasses with each subclass owning fewer characters. The classifier contains two layers: the first layer classifies Chinese characters into ten subclasses according to the pattern of the Chinese characters. In this layer, radicals embedded in the character are also extracted. The second layer further divides the ten subclasses by analyzing four symmetry features in the extracted radical.
Finally, an off-line Chinese character recognition methodology is proposed. The extracted stroked are rearranged and formed a 1-D stroke string. In the stroke string, strokes with the same type gather together. The reordered stroke string facilitates the building of intra-character relationships between strokes. While matching input character with characters in database, the difference of the intra-character relationships between the two characters are assessed. The output is the candidate characters being sorted descendingly according to the corresponding matching score.
Experimental results reveal that the proposed stroke extraction method has high tolerance with noise as well as more reliable extraction results; whereas, the proposed preclassifier for Chinese characters effectively reduces the members in each subclass. Experimental results also reveal that the proposed recognition scheme is feasible.
|