MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of f...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2015-01-01
|
Series: | Computational Intelligence and Neuroscience |
Online Access: | http://dx.doi.org/10.1155/2015/217216 |
id |
doaj-46b3e438b55a41169491f168fb09d032 |
---|---|
record_format |
Article |
spelling |
doaj-46b3e438b55a41169491f168fb09d0322020-11-24T22:05:55ZengHindawi LimitedComputational Intelligence and Neuroscience1687-52651687-52732015-01-01201510.1155/2015/217216217216MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale DataJingjing Wang0Chen Lin1School of Information Science and Technology, Xiamen University, Xiamen 361005, ChinaSchool of Information Science and Technology, Xiamen University, Xiamen 361005, ChinaLocality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives and false negatives is favored. To address these problems, in this paper we propose Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is embedded to tailor the number of false positives, false negatives, and the sum of both. PLSH is implemented in parallel using MapReduce framework to deal with similarity joins on large scale data. Experimental studies on real and simulated data verify the efficiency and effectiveness of our proposed PLSH technique, compared with state-of-the-art methods.http://dx.doi.org/10.1155/2015/217216 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jingjing Wang Chen Lin |
spellingShingle |
Jingjing Wang Chen Lin MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data Computational Intelligence and Neuroscience |
author_facet |
Jingjing Wang Chen Lin |
author_sort |
Jingjing Wang |
title |
MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data |
title_short |
MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data |
title_full |
MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data |
title_fullStr |
MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data |
title_full_unstemmed |
MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data |
title_sort |
mapreduce based personalized locality sensitive hashing for similarity joins on large scale data |
publisher |
Hindawi Limited |
series |
Computational Intelligence and Neuroscience |
issn |
1687-5265 1687-5273 |
publishDate |
2015-01-01 |
description |
Locality Sensitive Hashing (LSH) has been proposed as an efficient technique
for similarity joins for high dimensional data. The efficiency and approximation
rate of LSH depend on the number of generated false positive instances and false
negative instances. In many domains, reducing the number of false positives is
crucial. Furthermore, in some application scenarios, balancing false positives and
false negatives is favored. To address these problems, in this paper we propose
Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is
embedded to tailor the number of false positives, false negatives, and the sum of
both. PLSH is implemented in parallel using MapReduce framework to deal with
similarity joins on large scale data. Experimental studies on real and simulated data
verify the efficiency and effectiveness of our proposed PLSH technique, compared
with state-of-the-art methods. |
url |
http://dx.doi.org/10.1155/2015/217216 |
work_keys_str_mv |
AT jingjingwang mapreducebasedpersonalizedlocalitysensitivehashingforsimilarityjoinsonlargescaledata AT chenlin mapreducebasedpersonalizedlocalitysensitivehashingforsimilarityjoinsonlargescaledata |
_version_ |
1725824116106199040 |