Open Access
Subscription Access
Record Matching in Web Databases Using Unsupervised Approach
Record Matching is the problem of combining information from multiple heterogeneous databases. One step of data integration is relating the records that appear in the different databases specifically, determining which sets of records refer to the same real-world entities. Performing record matching solves the duplication detection problems; hence the needs for identifying the suitable record matching technique follow. Most of record matching methods are supervised, which requires the user to provide training data. These methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated. To overcome the problem, a new record matching method named Unsupervised Duplicate Detection (UDD) is proposed which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases and eliminating duplicates among records in dynamic query results. The idea of this paper is to adjust the weights of record fields in calculating similarities among records. Two classifiers namely weight component similarity summing classifier and support vector machine classifier are iteratively employed with UDD to identify duplicates in the query results from multiple Web databases.
Keywords
Record Matching, Unsupervised, UDD, Query Results.
User
Font Size
Information
Abstract Views: 108
PDF Views: 0