MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data

Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of f...

Full description

Bibliographic Details
Main Authors:	Jingjing Wang, Chen Lin
Format:	Article
Language:	English
Published:	Hindawi Limited 2015-01-01
Series:	Computational Intelligence and Neuroscience
Online Access:	http://dx.doi.org/10.1155/2015/217216

id	doaj-46b3e438b55a41169491f168fb09d032
record_format	Article
spelling	doaj-46b3e438b55a41169491f168fb09d0322020-11-24T22:05:55ZengHindawi LimitedComputational Intelligence and Neuroscience1687-52651687-52732015-01-01201510.1155/2015/217216217216MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale DataJingjing Wang0Chen Lin1School of Information Science and Technology, Xiamen University, Xiamen 361005, ChinaSchool of Information Science and Technology, Xiamen University, Xiamen 361005, ChinaLocality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives and false negatives is favored. To address these problems, in this paper we propose Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is embedded to tailor the number of false positives, false negatives, and the sum of both. PLSH is implemented in parallel using MapReduce framework to deal with similarity joins on large scale data. Experimental studies on real and simulated data verify the efficiency and effectiveness of our proposed PLSH technique, compared with state-of-the-art methods.http://dx.doi.org/10.1155/2015/217216
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jingjing Wang Chen Lin
spellingShingle	Jingjing Wang Chen Lin MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data Computational Intelligence and Neuroscience
author_facet	Jingjing Wang Chen Lin
author_sort	Jingjing Wang
title	MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
title_short	MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
title_full	MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
title_fullStr	MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
title_full_unstemmed	MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
title_sort	mapreduce based personalized locality sensitive hashing for similarity joins on large scale data
publisher	Hindawi Limited
series	Computational Intelligence and Neuroscience
issn	1687-5265 1687-5273
publishDate	2015-01-01
description	Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives and false negatives is favored. To address these problems, in this paper we propose Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is embedded to tailor the number of false positives, false negatives, and the sum of both. PLSH is implemented in parallel using MapReduce framework to deal with similarity joins on large scale data. Experimental studies on real and simulated data verify the efficiency and effectiveness of our proposed PLSH technique, compared with state-of-the-art methods.
url	http://dx.doi.org/10.1155/2015/217216
work_keys_str_mv	AT jingjingwang mapreducebasedpersonalizedlocalitysensitivehashingforsimilarityjoinsonlargescaledata AT chenlin mapreducebasedpersonalizedlocalitysensitivehashingforsimilarityjoinsonlargescaledata
_version_	1725824116106199040

MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data

Similar Items