The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).

If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.

Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.

Fullscreen Fullscreen Off


Objectives: To analysis various similarity join techniques to improve the data mining process.

Findings: Similarity join is an evaluation of similarity between any two objects. Many applications such as data cleaning, data integration, near duplicate detection and all data mining process can extensively benefit from the similarity join measure. Thus the similarity join can be performed between objects or strings or nodes etc. It finds all pairs of objects whose similarity is not smaller than the similarity threshold. There are different techniques and approaches are used to find the similarity join between objects in homogeneous information network. This paper provides detailed information about the different similarity join techniques.

Results: In this paper various similarity join techniques are compared through parameters to prove path based similarity join is better than other techniques.

Application/Improvements: The findings of this work prove that the path based similarity join provides better result than other approaches.


Keywords

Similarity Join, Data Cleaning, Data Integration, Near Duplicate Detection.
User
Notifications