A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing
Massive heterogeneous big data residing at different sites with various types and formats need to be integrated into a single unified view before starting data mining processes. Furthermore, in most of applications and research, a single big data source is not enough to complete the analysis and ach...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-10-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/10/20/7092 |
id |
doaj-cb91b66ac2224c8e8e63ab335410d29c |
---|---|
record_format |
Article |
spelling |
doaj-cb91b66ac2224c8e8e63ab335410d29c2020-11-25T02:44:51ZengMDPI AGApplied Sciences2076-34172020-10-01107092709210.3390/app10207092A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed ProcessingAmeera Almasoud0Hend Al-Khalifa1AbdulMalik Al-salman2Miltiadis Lytras3Information Technology Department, College of Computer and Information Sciences, King Saud University, Riyadh 11362, Saudi ArabiaInformation Technology Department, College of Computer and Information Sciences, King Saud University, Riyadh 11362, Saudi ArabiaComputer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh 11362, Saudi ArabiaComputer Science Department Effat, College of Engineering, Effat University, P.O. Box 34689, Jeddah 22332, Saudi ArabiaMassive heterogeneous big data residing at different sites with various types and formats need to be integrated into a single unified view before starting data mining processes. Furthermore, in most of applications and research, a single big data source is not enough to complete the analysis and achieve goals. Unfortunately, there is no general or standardized integration process; the nature of an integration process depends on the data type, domain, and integration purpose. Based on these parameters, we proposed, implemented, and tested a big data integration framework that integrates big data in the biology domain, based on the domain ontology and using distributed processing. The integration resulted in the same result as that obtained from the local integration. The results are equivalent in terms of the ontology size before the integration; in the number of added items, skipped items, and overlapped items; in the ontology size after the integration; and in the number of edges, vertices, and roots. The results also do not violate any logical consistency rules, passing all the logical consistency tests, such as Jena Ontology API, HermiT, and Pellet reasoners. The integration result is a new big data source that combines big data from several critical sources in the biology domain and transforms it into one unified format to help researchers and specialists use it for further research and analysis.https://www.mdpi.com/2076-3417/10/20/7092big databig data integrationbiological big dataontology integrationdistributed integration |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Ameera Almasoud Hend Al-Khalifa AbdulMalik Al-salman Miltiadis Lytras |
spellingShingle |
Ameera Almasoud Hend Al-Khalifa AbdulMalik Al-salman Miltiadis Lytras A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing Applied Sciences big data big data integration biological big data ontology integration distributed integration |
author_facet |
Ameera Almasoud Hend Al-Khalifa AbdulMalik Al-salman Miltiadis Lytras |
author_sort |
Ameera Almasoud |
title |
A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing |
title_short |
A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing |
title_full |
A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing |
title_fullStr |
A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing |
title_full_unstemmed |
A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing |
title_sort |
framework for enhancing big data integration in biological domain using distributed processing |
publisher |
MDPI AG |
series |
Applied Sciences |
issn |
2076-3417 |
publishDate |
2020-10-01 |
description |
Massive heterogeneous big data residing at different sites with various types and formats need to be integrated into a single unified view before starting data mining processes. Furthermore, in most of applications and research, a single big data source is not enough to complete the analysis and achieve goals. Unfortunately, there is no general or standardized integration process; the nature of an integration process depends on the data type, domain, and integration purpose. Based on these parameters, we proposed, implemented, and tested a big data integration framework that integrates big data in the biology domain, based on the domain ontology and using distributed processing. The integration resulted in the same result as that obtained from the local integration. The results are equivalent in terms of the ontology size before the integration; in the number of added items, skipped items, and overlapped items; in the ontology size after the integration; and in the number of edges, vertices, and roots. The results also do not violate any logical consistency rules, passing all the logical consistency tests, such as Jena Ontology API, HermiT, and Pellet reasoners. The integration result is a new big data source that combines big data from several critical sources in the biology domain and transforms it into one unified format to help researchers and specialists use it for further research and analysis. |
topic |
big data big data integration biological big data ontology integration distributed integration |
url |
https://www.mdpi.com/2076-3417/10/20/7092 |
work_keys_str_mv |
AT ameeraalmasoud aframeworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing AT hendalkhalifa aframeworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing AT abdulmalikalsalman aframeworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing AT miltiadislytras aframeworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing AT ameeraalmasoud frameworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing AT hendalkhalifa frameworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing AT abdulmalikalsalman frameworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing AT miltiadislytras frameworkforenhancingbigdataintegrationinbiologicaldomainusingdistributedprocessing |
_version_ |
1724765547537629184 |