Automation of Template and Data Extraction from Dynamic Web Documents

S. Pradeepa; K. Satheesbabu; K. Sabeetha

Automation of Template and Data Extraction from Dynamic Web Documents

S. Pradeepa ¹, K. Satheesbabu ², K. Sabeetha ³

Affiliations
1 Computer Science and Engineering, PSNA College of Engineering and Technology, Dindigul, India
2 Department of Computer Science and Engineering, PSNA College of Engineering and Technology, Dindigul, India
3 Department of Information Technology, PSNA College of Engineering and Technology, Dindigul, India

Many websites contain large set of pages generated using the common templates with contents. Due to the irrelevant terms in templates, they degrade the accuracy and performance of web applications. Thus, template detection techniques have received a lot of attention recently to improve the performance of search engines, clustering, and classification of web documents. Thus, in order to prevent the duplication in the templates, nowadays we handle them with some detection techniques. In this paper, we present techniques for automatically producing clusters based on MDL cost that can be used to extract search result records from dynamically generated web documents and extract the data from clustered documents using TTTCR algorithm. Data extraction is a process of extracting the data out of data processing for further data processing. Thus, we don't need additional template extraction process after clustering. Experimental results show that our proposed approach is feasible and effect for improving template and data extraction accuracy.

Keywords

Minimum Description Length (MDL), Template Extraction, Clustering, Template Table Text Chunk Removal (TTTCR).

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

Abstract Views: 192

PDF Views: 4

Automation of Template and Data Extraction from Dynamic Web Documents

Abstract Views: 192 | PDF Views: 4

Authors

S. Pradeepa
Computer Science and Engineering, PSNA College of Engineering and Technology, Dindigul, India

K. Satheesbabu
Department of Computer Science and Engineering, PSNA College of Engineering and Technology, Dindigul, India

K. Sabeetha
Department of Information Technology, PSNA College of Engineering and Technology, Dindigul, India

Abstract

Keywords

Minimum Description Length (MDL), Template Extraction, Clustering, Template Table Text Chunk Removal (TTTCR).

Username
Password
Remember me

Username
Password
Remember me

Data Mining and Knowledge Engineering

Data Mining and Knowledge Engineering

Automation of Template and Data Extraction from Dynamic Web Documents

Subscribe/Renew Journal

Keywords

Automation of Template and Data Extraction from Dynamic Web Documents

Authors

Abstract

Keywords