Open Access Open Access  Restricted Access Subscription Access

Improved Top K Similarity Join with Data Dependent Hashing in Heterogeneous Information Network


Affiliations
1 Sree Narayana Guru College, Coimbatore-641105, Tamil Nadu, India
 

Objectives: To reduce the length of hash codes in Local Sensitive Hashing (LSH)

Methods: Heterogeneous information network is a network where computers and other devices with different operating system are connected together. Today heterogeneous information network gets more attention in a network. But data mining becomes more difficult in heterogeneous information network. Similarity join is more important for many applications like online advertising, friend recommendation etc., similarity join is a measure of relationship between any two objects or strings or nodes. In this paper we considered the semantic meaning behind the paths to give top k similar pairs through Path-based Similarity join (PS-join) method. Then the expensive computations are removed by using bucket based data dependent hashing while the Local Sensitive Hashing is more expensive and involves more tedious process like to hold lengthier hash codes and approximate near neighbor problem.

Findings: The proposed data dependent hashing reduced the computation cost, memory and storage cost of hash codes and also overcome the problem of approximate near neighbor. The experimental results prove that the proposed technique works more efficiently than the existing technique in terms of recall, running time, and error ratio.

Application/Improvements: To increase the recall and to reduce the computation time and error ratio PS join with data dependent hashing is proposed.


Keywords

Heterogeneous Information Network, Top K Similar Pairs, Similarity Join, Path Based Similarity Join, Local Sensitive Hashing, Data Dependent Hashing.
User
Notifications

  • J.Feng, J.Wang, G.Li. Trie-join: a trie-based method for efficient string similarity joins. The VLDB Journal—The International Journal on Very Large Data Bases, 2012; 21(4), 437-461.
  • Y.Xiong, Y.Zhu, S. Y.Philip. Top-k similarity join in heterogeneous information networks. IEEE Transactions on Knowledge and Data Engineering, 2015; 27(6), 1710-1723.
  • J.Cui, W.Wang, D.Meng, Z.Liu. Continuous similarity join on data streams. In 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)2014 Dec, 552-559.
  • C.Gennaro, F.Rabitti. Scalability Issues for Self Similarity Join in Distributed Systems. In 2010 18th Euromicro Conference on Parallel, Distributed and Network-based processing. IEEE. 2010, 309-316.
  • Kunanusont, K., Chongstitvatana, J. An index structure for similarity join based on high-frequency queries. In Computer Science and Engineering Conference (ICSEC), 2014 International, 2014 Jul,415-420.
  • K.Kunanusont, Chongstitvatana. Finding a set of high-frequency queries for high-frequency-query-based filter for similarity join. In:Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2015 12th International Conference (pp. 1-6). IEEE.
  • J.Xu, B. Lei, Y. Gu, M.Winslett,G.Yu.Z.Zhang. Efficient Similarity Join Based on Earth Mover’s Distance Using MapReduce. IEEE Transactions on Knowledge and Data Engineering. 2015; 27(8), 2148-2162.
  • X.Lian, Chen, L. Similarity join processing on uncertain data streams. IEEE Transactions on Knowledge and Data Engineering. 2011; 23(11), 1718-1734.
  • S.Liu, G.Li, J.Feng. A prefix-filter based method for spatio-textual similarity join. IEEE Transactions on Knowledge and Data Engineering, 2014; 26(10), 2354-2367.
  • C.Rong, W.Lu, X.Wang, X.Du, Y.Chen, A. K.Tung. Efficient and scalable processing of string similarity join. IEEE Transactions on Knowledge and Data Engineering, 2013; 25(10), 2217-2230.
  • S. Mohammadi. Improvement of power Systems Operation using smart grid technology. Indian Journal of Education and Information Managements, 2012; 1(9), 396-405.

Abstract Views: 277

PDF Views: 0




  • Improved Top K Similarity Join with Data Dependent Hashing in Heterogeneous Information Network

Abstract Views: 277  |  PDF Views: 0

Authors

C. P. Rushida
Sree Narayana Guru College, Coimbatore-641105, Tamil Nadu, India
V. R. Nagarajan
Sree Narayana Guru College, Coimbatore-641105, Tamil Nadu, India

Abstract


Objectives: To reduce the length of hash codes in Local Sensitive Hashing (LSH)

Methods: Heterogeneous information network is a network where computers and other devices with different operating system are connected together. Today heterogeneous information network gets more attention in a network. But data mining becomes more difficult in heterogeneous information network. Similarity join is more important for many applications like online advertising, friend recommendation etc., similarity join is a measure of relationship between any two objects or strings or nodes. In this paper we considered the semantic meaning behind the paths to give top k similar pairs through Path-based Similarity join (PS-join) method. Then the expensive computations are removed by using bucket based data dependent hashing while the Local Sensitive Hashing is more expensive and involves more tedious process like to hold lengthier hash codes and approximate near neighbor problem.

Findings: The proposed data dependent hashing reduced the computation cost, memory and storage cost of hash codes and also overcome the problem of approximate near neighbor. The experimental results prove that the proposed technique works more efficiently than the existing technique in terms of recall, running time, and error ratio.

Application/Improvements: To increase the recall and to reduce the computation time and error ratio PS join with data dependent hashing is proposed.


Keywords


Heterogeneous Information Network, Top K Similar Pairs, Similarity Join, Path Based Similarity Join, Local Sensitive Hashing, Data Dependent Hashing.

References