Detecting Family Resemblance: Automated Genre Classification

This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual...

Full description

Bibliographic Details
Main Authors: Yunhyong Kim, Seamus Ross
Format: Article
Language:English
Published: Ubiquity Press 2007-03-01
Series:Data Science Journal
Subjects:
Online Access:http://datascience.codata.org/articles/405
id doaj-17aeace4c1b947a0b9b352604c4ce77b
record_format Article
spelling doaj-17aeace4c1b947a0b9b352604c4ce77b2020-11-25T00:22:42ZengUbiquity PressData Science Journal1683-14702007-03-01610.2481/dsj.6.S172407Detecting Family Resemblance: Automated Genre ClassificationYunhyong Kim0Seamus Ross1Digital Curation Centre (DCC) & Humanities Advanced Technology Information Institute (HATII), University of Glasgow, Glasgow, UKDigital Curation Centre (DCC) & Humanities Advanced Technology Information Institute (HATII), University of Glasgow, Glasgow, UKThis paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features, and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.http://datascience.codata.org/articles/405Automated genre classificationMetadataScientific informationInformation managementInformation extraction
collection DOAJ
language English
format Article
sources DOAJ
author Yunhyong Kim
Seamus Ross
spellingShingle Yunhyong Kim
Seamus Ross
Detecting Family Resemblance: Automated Genre Classification
Data Science Journal
Automated genre classification
Metadata
Scientific information
Information management
Information extraction
author_facet Yunhyong Kim
Seamus Ross
author_sort Yunhyong Kim
title Detecting Family Resemblance: Automated Genre Classification
title_short Detecting Family Resemblance: Automated Genre Classification
title_full Detecting Family Resemblance: Automated Genre Classification
title_fullStr Detecting Family Resemblance: Automated Genre Classification
title_full_unstemmed Detecting Family Resemblance: Automated Genre Classification
title_sort detecting family resemblance: automated genre classification
publisher Ubiquity Press
series Data Science Journal
issn 1683-1470
publishDate 2007-03-01
description This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features, and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.
topic Automated genre classification
Metadata
Scientific information
Information management
Information extraction
url http://datascience.codata.org/articles/405
work_keys_str_mv AT yunhyongkim detectingfamilyresemblanceautomatedgenreclassification
AT seamusross detectingfamilyresemblanceautomatedgenreclassification
_version_ 1725358702074003456