A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Varghese, Cyju
- An Analysis of Various Record Matching Approaches and Similarity Computations
Authors
1 Karunya University, IN
Source
Data Mining and Knowledge Engineering, Vol 3, No 2 (2011), Pagination: 99-103Abstract
Linking or matching databases is becoming increasingly important in many data mining projects, as linked data can contain information that is not available otherwise, or that would be too expensive to collect manually. Record matching refers to the task of finding similar entities in two or more records. Performing record matching solves the duplication detection problems; hence the needs for identifying the suitable record matching technique follow. This paper presents a survey on record matching techniques highlighting what approaches are utilized, the number of classifiers used, multiple stages of duplication detection performed, thus comparing each technique with other. This paper also exhibits the various matching metrics available. Further, we want to point out potential pitfalls as well as challenging issues need to be addressed by a record matching technique. And then we exhibit an unsupervised method to perform record matching on a web database scenario. We believe that the results of this evaluation will help analyst to come with more easier and feasible methods for record matching. This is a real challenging task particularly in Web scenario.Keywords
Duplication Detection, Record Matching, Similarity Calculation, Unsupervised.- Protection of Data from a Semi-Honest Party Using Fast Association Rule Hiding
Authors
1 Karunya University, IN
Source
Data Mining and Knowledge Engineering, Vol 3, No 1 (2011), Pagination: 67-70Abstract
Data mining technique is an emerging technique applied in strategic decision-making as well as in many more application areas. Nevertheless, it also has a few demerits apart from its utility. The data mining tools may bring out information that should not be disclosed to a semi honest party. Different approaches are being used to hide the sensitive information. This paper proposes a novel method to access the generating transactions from the transactional database. It helps in reducing the time and space complexities of any hiding algorithm. Theoretical and empirical analysis of the algorithm shows that hiding of data using this proposed technique performs association rule hiding quicker than other algorithms.