LCSS-Based Algorithm for Computing Multivariate Data Set Similarity: A Case Study of Real-Time WSN Data

Multivariate data sets are common in various application areas, such as wireless sensor networks (WSNs) and DNA analysis. A robust mechanism is required to compute their similarity indexes regardless of the environment and problem domain. This study describes the usefulness of a non-metric-based app...

Full description

Bibliographic Details
Main Authors: Rahim Khan, Ihsan Ali, Saleh M. Altowaijri, Muhammad Zakarya, Atiq Ur Rahman, Ismail Ahmedy, Anwar Khan, Abdullah Gani
Format: Article
Language:English
Published: MDPI AG 2019-01-01
Series:Sensors
Subjects:
Online Access:http://www.mdpi.com/1424-8220/19/1/166
id doaj-2c019a147d0d40f8859a237f84533ac2
record_format Article
spelling doaj-2c019a147d0d40f8859a237f84533ac22020-11-25T00:33:50ZengMDPI AGSensors1424-82202019-01-0119116610.3390/s19010166s19010166LCSS-Based Algorithm for Computing Multivariate Data Set Similarity: A Case Study of Real-Time WSN DataRahim Khan0Ihsan Ali1Saleh M. Altowaijri2Muhammad Zakarya3Atiq Ur Rahman4Ismail Ahmedy5Anwar Khan6Abdullah Gani7Department of Computer Science, Abdul Wali Khan University, Mardan 23200, PakistanDepartment of Computer System and Technology, Faculty of Computer Science and IT, University of Malaya, Kuala Lumpur 50603, MalaysiaFaculty of Computing and Information Technology, Northern Border University, Rafha 91911, Saudi ArabiaDepartment of Computer Science, Abdul Wali Khan University, Mardan 23200, PakistanFaculty of Computing and Information Technology, Northern Border University, Rafha 91911, Saudi ArabiaDepartment of Computer System and Technology, Faculty of Computer Science and IT, University of Malaya, Kuala Lumpur 50603, MalaysiaDepartment of Electronics, University of Peshawar, Peshawar 25000, PakistanSchool of Computing and Information Technology, Taylor’s University, Subang Jaya 47500, MalaysiaMultivariate data sets are common in various application areas, such as wireless sensor networks (WSNs) and DNA analysis. A robust mechanism is required to compute their similarity indexes regardless of the environment and problem domain. This study describes the usefulness of a non-metric-based approach (i.e., longest common subsequence) in computing similarity indexes. Several non-metric-based algorithms are available in the literature, the most robust and reliable one is the dynamic programming-based technique. However, dynamic programming-based techniques are considered inefficient, particularly in the context of multivariate data sets. Furthermore, the classical approaches are not powerful enough in scenarios with multivariate data sets, sensor data or when the similarity indexes are extremely high or low. To address this issue, we propose an efficient algorithm to measure the similarity indexes of multivariate data sets using a non-metric-based methodology. The proposed algorithm performs exceptionally well on numerous multivariate data sets compared with the classical dynamic programming-based algorithms. The performance of the algorithms is evaluated on the basis of several benchmark data sets and a dynamic multivariate data set, which is obtained from a WSN deployed in the Ghulam Ishaq Khan (GIK) Institute of Engineering Sciences and Technology. Our evaluation suggests that the proposed algorithm can be approximately 39.9% more efficient than its counterparts for various data sets in terms of computational time.http://www.mdpi.com/1424-8220/19/1/166multivariate data setlongest common subsequencedynamic programmingWSN data
collection DOAJ
language English
format Article
sources DOAJ
author Rahim Khan
Ihsan Ali
Saleh M. Altowaijri
Muhammad Zakarya
Atiq Ur Rahman
Ismail Ahmedy
Anwar Khan
Abdullah Gani
spellingShingle Rahim Khan
Ihsan Ali
Saleh M. Altowaijri
Muhammad Zakarya
Atiq Ur Rahman
Ismail Ahmedy
Anwar Khan
Abdullah Gani
LCSS-Based Algorithm for Computing Multivariate Data Set Similarity: A Case Study of Real-Time WSN Data
Sensors
multivariate data set
longest common subsequence
dynamic programming
WSN data
author_facet Rahim Khan
Ihsan Ali
Saleh M. Altowaijri
Muhammad Zakarya
Atiq Ur Rahman
Ismail Ahmedy
Anwar Khan
Abdullah Gani
author_sort Rahim Khan
title LCSS-Based Algorithm for Computing Multivariate Data Set Similarity: A Case Study of Real-Time WSN Data
title_short LCSS-Based Algorithm for Computing Multivariate Data Set Similarity: A Case Study of Real-Time WSN Data
title_full LCSS-Based Algorithm for Computing Multivariate Data Set Similarity: A Case Study of Real-Time WSN Data
title_fullStr LCSS-Based Algorithm for Computing Multivariate Data Set Similarity: A Case Study of Real-Time WSN Data
title_full_unstemmed LCSS-Based Algorithm for Computing Multivariate Data Set Similarity: A Case Study of Real-Time WSN Data
title_sort lcss-based algorithm for computing multivariate data set similarity: a case study of real-time wsn data
publisher MDPI AG
series Sensors
issn 1424-8220
publishDate 2019-01-01
description Multivariate data sets are common in various application areas, such as wireless sensor networks (WSNs) and DNA analysis. A robust mechanism is required to compute their similarity indexes regardless of the environment and problem domain. This study describes the usefulness of a non-metric-based approach (i.e., longest common subsequence) in computing similarity indexes. Several non-metric-based algorithms are available in the literature, the most robust and reliable one is the dynamic programming-based technique. However, dynamic programming-based techniques are considered inefficient, particularly in the context of multivariate data sets. Furthermore, the classical approaches are not powerful enough in scenarios with multivariate data sets, sensor data or when the similarity indexes are extremely high or low. To address this issue, we propose an efficient algorithm to measure the similarity indexes of multivariate data sets using a non-metric-based methodology. The proposed algorithm performs exceptionally well on numerous multivariate data sets compared with the classical dynamic programming-based algorithms. The performance of the algorithms is evaluated on the basis of several benchmark data sets and a dynamic multivariate data set, which is obtained from a WSN deployed in the Ghulam Ishaq Khan (GIK) Institute of Engineering Sciences and Technology. Our evaluation suggests that the proposed algorithm can be approximately 39.9% more efficient than its counterparts for various data sets in terms of computational time.
topic multivariate data set
longest common subsequence
dynamic programming
WSN data
url http://www.mdpi.com/1424-8220/19/1/166
work_keys_str_mv AT rahimkhan lcssbasedalgorithmforcomputingmultivariatedatasetsimilarityacasestudyofrealtimewsndata
AT ihsanali lcssbasedalgorithmforcomputingmultivariatedatasetsimilarityacasestudyofrealtimewsndata
AT salehmaltowaijri lcssbasedalgorithmforcomputingmultivariatedatasetsimilarityacasestudyofrealtimewsndata
AT muhammadzakarya lcssbasedalgorithmforcomputingmultivariatedatasetsimilarityacasestudyofrealtimewsndata
AT atiqurrahman lcssbasedalgorithmforcomputingmultivariatedatasetsimilarityacasestudyofrealtimewsndata
AT ismailahmedy lcssbasedalgorithmforcomputingmultivariatedatasetsimilarityacasestudyofrealtimewsndata
AT anwarkhan lcssbasedalgorithmforcomputingmultivariatedatasetsimilarityacasestudyofrealtimewsndata
AT abdullahgani lcssbasedalgorithmforcomputingmultivariatedatasetsimilarityacasestudyofrealtimewsndata
_version_ 1725314638743076864