A Semantic Deduplication of Temporal Dynamic Records from Multiple Web Databases

R. Parimala Devi; V. Thigarasu

doi:10.17485/ijst/2015/v8i34/124256

A Semantic Deduplication of Temporal Dynamic Records from Multiple Web Databases

Affiliations
1 Department of Computer Science, Karpagam University, Coimbatore - 641021, Tamil Nadu, India
2 Department of Computer Science, Gobi Arts and Science College, Gobichettipalayam - 638453, Tamil Nadu, India

Abstract
References
Article Metrics
Refbacks

Objective: The main objective of this paper is to improve the true positive level of record deduplication using Ontology based MHMM-Fuzzy clustering approach. Methods/Statistical Analysis: Most of the record deduplication system in literature used genetic programming based record deduplication which combined different pieces of evidence extracted from the data content. However the accuracy of the system is low. To overcome this problem we propose a Multiple Hidden Markov Model (MHMM) which is used to increase the accuracy and also to identify joint duplicate records. In this model, if the database has multiple columns, it performs the deduplication for the all columns which will degrade the performance of the system. So to solve this problem, MHMM-Fuzzy Clustering based record deduplication is introduced. In this system Fuzzy clustering is performed through multiple observations from the Hidden Markov Model. Then duplicate data are grouped into one cluster according to their fuzzy logic and it can be eliminated easily. However the true positive level of the system is low. To improve the true positive level Fuzzy Ontology based semantic similarity is incorporated in MHMM-Fuzzy Clustering approach. This implies the improvement of the true positive level of the model. Thus it increases the efficiency of deduplication function that identifies the records of replica and duplications. Findings: Multiple Hidden Markov Model (MHMM) based record deduplication, MHMM-Fuzzy clustering based record deduplication and Ontology based MHMM-Fuzzy clustering approach are applied on Cora Bibliographic dataset and Restaurants dataset. The performance measures are evaluated in terms of precision, recall, f-measure, Execution time and accuracy results. Applications/Improvements: Thus the current research achieves improved result on record deduplication is better than previous works in terms of precision, recall, f-measure, Execution time and accuracy results.

Keywords

Hidden State Sequence, Membership Function, Observation Sequence, States, Semantic Deduplication

About the Journal

Editorial Board

Current Issue

Archives

Advanced Search

Article Submission

Registration

Subscription

User

Information

Journal Content
Browse

Donations

Abstract Views: 219

PDF Views: 0

A Semantic Deduplication of Temporal Dynamic Records from Multiple Web Databases

Abstract Views: 219 | PDF Views: 0

Authors

R. Parimala Devi
Department of Computer Science, Karpagam University, Coimbatore - 641021, Tamil Nadu, India

V. Thigarasu
Department of Computer Science, Gobi Arts and Science College, Gobichettipalayam - 638453, Tamil Nadu, India

Abstract

Keywords

Hidden State Sequence, Membership Function, Observation Sequence, States, Semantic Deduplication

DOI: https://doi.org/10.17485/ijst%2F2015%2Fv8i34%2F124256

Username
Password
Remember me

Username
Password
Remember me

Indian Journal of Science and Technology

A Semantic Deduplication of Temporal Dynamic Records from Multiple Web Databases

Keywords

A Semantic Deduplication of Temporal Dynamic Records from Multiple Web Databases

Authors

Abstract

Keywords