Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering

Textual data plays an important role in a number of applications such as image database indexing, document understanding, and image-based web searching. The target of automatic real-life text extracting in document images without character recognition module is to identify image regions that contain...

Full description

Bibliographic Details
Main Authors:	Hoai Nam Vu, Tuan Anh Tran, Na In Seop, Soo Hyung Kim
Format:	Article
Language:	English
Published:	Atlantis Press 2016-01-01
Series:	International Journal of Networked and Distributed Computing (IJNDC)
Subjects:	Multilevel K-means Connected Component Thesholding.
Online Access:	https://www.atlantis-press.com/article/25846118.pdf

id	doaj-a320b4f0c8754b4fb485d2420e0a475a
record_format	Article
spelling	doaj-a320b4f0c8754b4fb485d2420e0a475a2020-11-25T01:58:48ZengAtlantis PressInternational Journal of Networked and Distributed Computing (IJNDC)2211-79462016-01-014110.2991/ijndc.2016.4.1.2Extraction of Text Regions from Complex Background in Document Images by Multilevel ClusteringHoai Nam VuTuan Anh TranNa In SeopSoo Hyung KimTextual data plays an important role in a number of applications such as image database indexing, document understanding, and image-based web searching. The target of automatic real-life text extracting in document images without character recognition module is to identify image regions that contain only text. These textual regions can then be either input of optical character recognition application or highlighted for user focusing. In this paper we propose a method which consists of three stages-preprocessing which improves contrast of grayscale image, multi-level thresholding for separating textual region from non-textual object such as graphics, pictures, and complex background, and heuristic filter, recursive filter for text localizing in textual region. In many of these applications, it is not necessary to identify all the text regions, therefore we emphasize on identifying important text region with relatively large size and high contrast. Experimental results on real-life dataset images demonstrate that the proposed method is effective in identifying textual region with various illuminations, size and font from various types of background.https://www.atlantis-press.com/article/25846118.pdfMultilevelK-meansConnected ComponentThesholding.
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Hoai Nam Vu Tuan Anh Tran Na In Seop Soo Hyung Kim
spellingShingle	Hoai Nam Vu Tuan Anh Tran Na In Seop Soo Hyung Kim Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering International Journal of Networked and Distributed Computing (IJNDC) Multilevel K-means Connected Component Thesholding.
author_facet	Hoai Nam Vu Tuan Anh Tran Na In Seop Soo Hyung Kim
author_sort	Hoai Nam Vu
title	Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering
title_short	Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering
title_full	Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering
title_fullStr	Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering
title_full_unstemmed	Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering
title_sort	extraction of text regions from complex background in document images by multilevel clustering
publisher	Atlantis Press
series	International Journal of Networked and Distributed Computing (IJNDC)
issn	2211-7946
publishDate	2016-01-01
description	Textual data plays an important role in a number of applications such as image database indexing, document understanding, and image-based web searching. The target of automatic real-life text extracting in document images without character recognition module is to identify image regions that contain only text. These textual regions can then be either input of optical character recognition application or highlighted for user focusing. In this paper we propose a method which consists of three stages-preprocessing which improves contrast of grayscale image, multi-level thresholding for separating textual region from non-textual object such as graphics, pictures, and complex background, and heuristic filter, recursive filter for text localizing in textual region. In many of these applications, it is not necessary to identify all the text regions, therefore we emphasize on identifying important text region with relatively large size and high contrast. Experimental results on real-life dataset images demonstrate that the proposed method is effective in identifying textual region with various illuminations, size and font from various types of background.
topic	Multilevel K-means Connected Component Thesholding.
url	https://www.atlantis-press.com/article/25846118.pdf
work_keys_str_mv	AT hoainamvu extractionoftextregionsfromcomplexbackgroundindocumentimagesbymultilevelclustering AT tuananhtran extractionoftextregionsfromcomplexbackgroundindocumentimagesbymultilevelclustering AT nainseop extractionoftextregionsfromcomplexbackgroundindocumentimagesbymultilevelclustering AT soohyungkim extractionoftextregionsfromcomplexbackgroundindocumentimagesbymultilevelclustering
_version_	1724968061042163712

Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering

Similar Items