Automatic Classification of Web Images as UML Static Diagrams Using Machine Learning Techniques

Our purpose in this research is to develop a method to automatically and efficiently classify web images as Unified Modeling Language (UML) static diagrams, and to produce a computer tool that implements this function. The tool receives a bitmap file (in different formats) as an input and communicat...

Full description

Bibliographic Details
Main Authors: Valentín Moreno, Gonzalo Génova, Manuela Alejandres, Anabel Fraga
Format: Article
Language:English
Published: MDPI AG 2020-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/7/2406
id doaj-95b3118cfc254558b626089e894be886
record_format Article
spelling doaj-95b3118cfc254558b626089e894be8862020-11-25T02:04:11ZengMDPI AGApplied Sciences2076-34172020-04-01102406240610.3390/app10072406Automatic Classification of Web Images as UML Static Diagrams Using Machine Learning TechniquesValentín Moreno0Gonzalo Génova1Manuela Alejandres2Anabel Fraga3Knowledge Reuse Group, Departamento de Informática, Universidad Carlos III de Madrid. Av. Universidad 30, 28911 Leganés (Madrid), SpainKnowledge Reuse Group, Departamento de Informática, Universidad Carlos III de Madrid. Av. Universidad 30, 28911 Leganés (Madrid), SpainKnowledge Reuse Group, Departamento de Informática, Universidad Carlos III de Madrid. Av. Universidad 30, 28911 Leganés (Madrid), SpainKnowledge Reuse Group, Departamento de Informática, Universidad Carlos III de Madrid. Av. Universidad 30, 28911 Leganés (Madrid), SpainOur purpose in this research is to develop a method to automatically and efficiently classify web images as Unified Modeling Language (UML) static diagrams, and to produce a computer tool that implements this function. The tool receives a bitmap file (in different formats) as an input and communicates whether the image corresponds to a diagram. For pragmatic reasons, we restricted ourselves to the simplest kinds of diagrams that are more useful for automated software reuse: computer-edited 2D representations of static diagrams. The tool does not require that the images are explicitly or implicitly tagged as UML diagrams. The tool extracts graphical characteristics from each image (such as grayscale histogram, color histogram and elementary geometric forms) and uses a combination of rules to classify it. The rules are obtained with machine learning techniques (rule induction) from a sample of 19,000 web images manually classified by experts. In this work, we do not consider the textual contents of the images. Our tool reaches nearly 95% of agreement with manually classified instances, improving the effectiveness of related research works. Moreover, using a training dataset 15 times bigger, the time required to process each image and extract its graphical features (0.680 s) is seven times lower.https://www.mdpi.com/2076-3417/10/7/2406UML diagram recognitionimage processingimage classificationrule inductionclassification tool
collection DOAJ
language English
format Article
sources DOAJ
author Valentín Moreno
Gonzalo Génova
Manuela Alejandres
Anabel Fraga
spellingShingle Valentín Moreno
Gonzalo Génova
Manuela Alejandres
Anabel Fraga
Automatic Classification of Web Images as UML Static Diagrams Using Machine Learning Techniques
Applied Sciences
UML diagram recognition
image processing
image classification
rule induction
classification tool
author_facet Valentín Moreno
Gonzalo Génova
Manuela Alejandres
Anabel Fraga
author_sort Valentín Moreno
title Automatic Classification of Web Images as UML Static Diagrams Using Machine Learning Techniques
title_short Automatic Classification of Web Images as UML Static Diagrams Using Machine Learning Techniques
title_full Automatic Classification of Web Images as UML Static Diagrams Using Machine Learning Techniques
title_fullStr Automatic Classification of Web Images as UML Static Diagrams Using Machine Learning Techniques
title_full_unstemmed Automatic Classification of Web Images as UML Static Diagrams Using Machine Learning Techniques
title_sort automatic classification of web images as uml static diagrams using machine learning techniques
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2020-04-01
description Our purpose in this research is to develop a method to automatically and efficiently classify web images as Unified Modeling Language (UML) static diagrams, and to produce a computer tool that implements this function. The tool receives a bitmap file (in different formats) as an input and communicates whether the image corresponds to a diagram. For pragmatic reasons, we restricted ourselves to the simplest kinds of diagrams that are more useful for automated software reuse: computer-edited 2D representations of static diagrams. The tool does not require that the images are explicitly or implicitly tagged as UML diagrams. The tool extracts graphical characteristics from each image (such as grayscale histogram, color histogram and elementary geometric forms) and uses a combination of rules to classify it. The rules are obtained with machine learning techniques (rule induction) from a sample of 19,000 web images manually classified by experts. In this work, we do not consider the textual contents of the images. Our tool reaches nearly 95% of agreement with manually classified instances, improving the effectiveness of related research works. Moreover, using a training dataset 15 times bigger, the time required to process each image and extract its graphical features (0.680 s) is seven times lower.
topic UML diagram recognition
image processing
image classification
rule induction
classification tool
url https://www.mdpi.com/2076-3417/10/7/2406
work_keys_str_mv AT valentinmoreno automaticclassificationofwebimagesasumlstaticdiagramsusingmachinelearningtechniques
AT gonzalogenova automaticclassificationofwebimagesasumlstaticdiagramsusingmachinelearningtechniques
AT manuelaalejandres automaticclassificationofwebimagesasumlstaticdiagramsusingmachinelearningtechniques
AT anabelfraga automaticclassificationofwebimagesasumlstaticdiagramsusingmachinelearningtechniques
_version_ 1724944034244329472