Summary: | 碩士 === 國立中央大學 === 資訊工程研究所 === 85 === We have proposed a table-form document analysis system. There are
four modules in the system, the study work in this paper is just the forth
module. In this paper, algorithms for character segmentation and table-form
recognition are proposed. First, we generate connected components as the
basic units in character segmentation. Many Chinese characters consist of
more than one radical, we group the isolated radicals into a complete Chinese
word based on several heuristic rules. We also proposed a projection-profile
method to solve touching-character problem. Connected components will
be incorporated into complete and meaningful character components during
character segmentation. We classify processed components into texts and
graphs and then extract field attributes. Finally, a
hierarchical recognition is
proposed to determine whether an input form document is the same as a
document in the database based on the extracted structure features and field
attributes. The performance of proposed algorithms are evaluated using lots
of table-form images.
|