Efficient near duplicate document detection for specialized corpora

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009. === Includes bibliographical references (p. 75-77). === Knowledge of near duplicate documents can be adventagous to search engines, even those that only cover a small enterprise or sp...

Full description

Bibliographic Details
Main Author: Seshasai, Shreyes
Other Authors: David Spencer and Regina Barzilay.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2010
Subjects:
Online Access:http://hdl.handle.net/1721.1/53116