Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Optimized Web Page Generation Using Web Content Mining


Affiliations
1 Department of Computer Science and Engineering, Annamalai University, Annamalai Nagar, Chidambaram, Tamil Nadu, India
     

   Subscribe/Renew Journal


In the past few years, there has been an exponential increase in the amount of information available on World Wide Web. Web pages have been the potential source of information retrieval and data mining technology, but most HTML documents on Internet are cluttered with large amount of less informative and typically unrelated materials such as large amount of banner ads, navigation bars and copyright notices etc. Such irrelevant information is not part of the main content of the pages, they will seriously harm Web mining and searching. In this paper we develop an automatic HTML generator to generate optimized web pages using Web content mining from the already existing web pages. The input for the HTML generator is any HTML webpage or web pages. The web pages are downloaded manually by the user or by using the download manager developed in the automatic HTML generator. These downloaded pages are mined and useful information's are extracted including keywords and stored in the specific location. By using the keywords Web pages are clustered by Dbscan clustering algorithm to identify website category. With the help of these mined resources a new optimized webpage is created. This web page will be user friendly and noise free in nature and it may contains text, images, audio, video, structured list and hyperlink structures. Although only sample web pages of five different categories are considered, the proposed method can be applied to any web pages that can be mined for knowledge extraction.

Keywords

Web Content Mining, Text Mining, Web Structure Mining, Link Mining, HTML Generator.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 187

PDF Views: 2




  • Optimized Web Page Generation Using Web Content Mining

Abstract Views: 187  |  PDF Views: 2

Authors

M. Karthikeyan
Department of Computer Science and Engineering, Annamalai University, Annamalai Nagar, Chidambaram, Tamil Nadu, India
P. Aruna
Department of Computer Science and Engineering, Annamalai University, Annamalai Nagar, Chidambaram, Tamil Nadu, India

Abstract


In the past few years, there has been an exponential increase in the amount of information available on World Wide Web. Web pages have been the potential source of information retrieval and data mining technology, but most HTML documents on Internet are cluttered with large amount of less informative and typically unrelated materials such as large amount of banner ads, navigation bars and copyright notices etc. Such irrelevant information is not part of the main content of the pages, they will seriously harm Web mining and searching. In this paper we develop an automatic HTML generator to generate optimized web pages using Web content mining from the already existing web pages. The input for the HTML generator is any HTML webpage or web pages. The web pages are downloaded manually by the user or by using the download manager developed in the automatic HTML generator. These downloaded pages are mined and useful information's are extracted including keywords and stored in the specific location. By using the keywords Web pages are clustered by Dbscan clustering algorithm to identify website category. With the help of these mined resources a new optimized webpage is created. This web page will be user friendly and noise free in nature and it may contains text, images, audio, video, structured list and hyperlink structures. Although only sample web pages of five different categories are considered, the proposed method can be applied to any web pages that can be mined for knowledge extraction.

Keywords


Web Content Mining, Text Mining, Web Structure Mining, Link Mining, HTML Generator.