DoCA: A Content-Based Automatic Classification System Over Digital Documents
Regardless of industry, the overload of information facing most organizations today is a drain on both individuals and the enterprise itself. The increasing volume of this information, which is stored in different electronic formats, requires new sophisticated systems to analyse and classify them. I...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8768370/ |
id |
doaj-a7eb84a1c3084bfc8d01094e699e617d |
---|---|
record_format |
Article |
spelling |
doaj-a7eb84a1c3084bfc8d01094e699e617d2021-03-29T23:39:01ZengIEEEIEEE Access2169-35362019-01-017979969800410.1109/ACCESS.2019.29303398768370DoCA: A Content-Based Automatic Classification System Over Digital DocumentsSuleyman Eken0https://orcid.org/0000-0001-9488-908XHoussem Menhour1https://orcid.org/0000-0001-8920-7830Kubra Koksal2Department of Computer Engineering, Umuttepe Campus, Kocaeli University, Kocaeli, TurkeyDepartment of Computer Engineering, Umuttepe Campus, Kocaeli University, Kocaeli, TurkeyDepartment of Computer Engineering, Umuttepe Campus, Kocaeli University, Kocaeli, TurkeyRegardless of industry, the overload of information facing most organizations today is a drain on both individuals and the enterprise itself. The increasing volume of this information, which is stored in different electronic formats, requires new sophisticated systems to analyse and classify them. In this paper, we attempt to implement a framework Document Classification and Analysis (DoCA) that can simplify and automate such tasks for different file types, namely: office documents (text, spreadsheets, and presentations), scanned documents (images and PDFs), multimedia files (video and audio). Each file type requires different methods for pre-processing, analysis, and classification. The efficiency and feasibility of the DoCA are examined on HAVELSAN dataset and accuracy of different tasks shows that the DoCA is a promising tool for document analysis and classification.https://ieeexplore.ieee.org/document/8768370/Document analysisdocument classificationOCRvideo-audio analysis |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Suleyman Eken Houssem Menhour Kubra Koksal |
spellingShingle |
Suleyman Eken Houssem Menhour Kubra Koksal DoCA: A Content-Based Automatic Classification System Over Digital Documents IEEE Access Document analysis document classification OCR video-audio analysis |
author_facet |
Suleyman Eken Houssem Menhour Kubra Koksal |
author_sort |
Suleyman Eken |
title |
DoCA: A Content-Based Automatic Classification System Over Digital Documents |
title_short |
DoCA: A Content-Based Automatic Classification System Over Digital Documents |
title_full |
DoCA: A Content-Based Automatic Classification System Over Digital Documents |
title_fullStr |
DoCA: A Content-Based Automatic Classification System Over Digital Documents |
title_full_unstemmed |
DoCA: A Content-Based Automatic Classification System Over Digital Documents |
title_sort |
doca: a content-based automatic classification system over digital documents |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
Regardless of industry, the overload of information facing most organizations today is a drain on both individuals and the enterprise itself. The increasing volume of this information, which is stored in different electronic formats, requires new sophisticated systems to analyse and classify them. In this paper, we attempt to implement a framework Document Classification and Analysis (DoCA) that can simplify and automate such tasks for different file types, namely: office documents (text, spreadsheets, and presentations), scanned documents (images and PDFs), multimedia files (video and audio). Each file type requires different methods for pre-processing, analysis, and classification. The efficiency and feasibility of the DoCA are examined on HAVELSAN dataset and accuracy of different tasks shows that the DoCA is a promising tool for document analysis and classification. |
topic |
Document analysis document classification OCR video-audio analysis |
url |
https://ieeexplore.ieee.org/document/8768370/ |
work_keys_str_mv |
AT suleymaneken docaacontentbasedautomaticclassificationsystemoverdigitaldocuments AT houssemmenhour docaacontentbasedautomaticclassificationsystemoverdigitaldocuments AT kubrakoksal docaacontentbasedautomaticclassificationsystemoverdigitaldocuments |
_version_ |
1724189118777262080 |