A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing

Massive heterogeneous big data residing at different sites with various types and formats need to be integrated into a single unified view before starting data mining processes. Furthermore, in most of applications and research, a single big data source is not enough to complete the analysis and ach...

Full description

Bibliographic Details
Main Authors: Ameera Almasoud, Hend Al-Khalifa, AbdulMalik Al-salman, Miltiadis Lytras
Format: Article
Language:English
Published: MDPI AG 2020-10-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/20/7092
id doaj-cb91b66ac2224c8e8e63ab335410d29c
record_format Article
spelling doaj-cb91b66ac2224c8e8e63ab335410d29c2020-11-25T02:44:51ZengMDPI AGApplied Sciences2076-34172020-10-01107092709210.3390/app10207092A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed ProcessingAmeera Almasoud0Hend Al-Khalifa1AbdulMalik Al-salman2Miltiadis Lytras3Information Technology Department, College of Computer and Information Sciences, King Saud University, Riyadh 11362, Saudi ArabiaInformation Technology Department, College of Computer and Information Sciences, King Saud University, Riyadh 11362, Saudi ArabiaComputer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh 11362, Saudi ArabiaComputer Science Department Effat, College of Engineering, Effat University, P.O. Box 34689, Jeddah 22332, Saudi ArabiaMassive heterogeneous big data residing at different sites with various types and formats need to be integrated into a single unified view before starting data mining processes. Furthermore, in most of applications and research, a single big data source is not enough to complete the analysis and achieve goals. Unfortunately, there is no general or standardized integration process; the nature of an integration process depends on the data type, domain, and integration purpose. Based on these parameters, we proposed, implemented, and tested a big data integration framework that integrates big data in the biology domain, based on the domain ontology and using distributed processing. The integration resulted in the same result as that obtained from the local integration. The results are equivalent in terms of the ontology size before the integration; in the number of added items, skipped items, and overlapped items; in the ontology size after the integration; and in the number of edges, vertices, and roots. The results also do not violate any logical consistency rules, passing all the logical consistency tests, such as Jena Ontology API, HermiT, and Pellet reasoners. The integration result is a new big data source that combines big data from several critical sources in the biology domain and transforms it into one unified format to help researchers and specialists use it for further research and analysis.https://www.mdpi.com/2076-3417/10/20/7092big databig data integrationbiological big dataontology integrationdistributed integration
collection DOAJ
language English
format Article
sources DOAJ
author Ameera Almasoud
Hend Al-Khalifa
AbdulMalik Al-salman
Miltiadis Lytras
spellingShingle Ameera Almasoud
Hend Al-Khalifa
AbdulMalik Al-salman
Miltiadis Lytras
A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing
Applied Sciences
big data
big data integration
biological big data
ontology integration
distributed integration
author_facet Ameera Almasoud
Hend Al-Khalifa
AbdulMalik Al-salman
Miltiadis Lytras
author_sort Ameera Almasoud
title A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing
title_short A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing
title_full A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing
title_fullStr A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing
title_full_unstemmed A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing
title_sort framework for enhancing big data integration in biological domain using distributed processing
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2020-10-01
description Massive heterogeneous big data residing at different sites with various types and formats need to be integrated into a single unified view before starting data mining processes. Furthermore, in most of applications and research, a single big data source is not enough to complete the analysis and achieve goals. Unfortunately, there is no general or standardized integration process; the nature of an integration process depends on the data type, domain, and integration purpose. Based on these parameters, we proposed, implemented, and tested a big data integration framework that integrates big data in the biology domain, based on the domain ontology and using distributed processing. The integration resulted in the same result as that obtained from the local integration. The results are equivalent in terms of the ontology size before the integration; in the number of added items, skipped items, and overlapped items; in the ontology size after the integration; and in the number of edges, vertices, and roots. The results also do not violate any logical consistency rules, passing all the logical consistency tests, such as Jena Ontology API, HermiT, and Pellet reasoners. The integration result is a new big data source that combines big data from several critical sources in the biology domain and transforms it into one unified format to help researchers and specialists use it for further research and analysis.
topic big data
big data integration
biological big data
ontology integration
distributed integration
url https://www.mdpi.com/2076-3417/10/20/7092
work_keys_str_mv AT ameeraalmasoud aframeworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing
AT hendalkhalifa aframeworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing
AT abdulmalikalsalman aframeworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing
AT miltiadislytras aframeworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing
AT ameeraalmasoud frameworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing
AT hendalkhalifa frameworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing
AT abdulmalikalsalman frameworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing
AT miltiadislytras frameworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing
_version_ 1724765547537629184