Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

A Novel Approach for Mining Web Documents Based on Bayesian Learning Classifier Systems


Affiliations
1 Kalaignar Karunaidhi Institute of Tech, India
2 VLBJCET, India
     

   Subscribe/Renew Journal


Web mining is a new area of data mining. Since web is one of the biggest repositories of data, analyzing and exploring regularities using data mining in web user behavior can improve system performance and enhance the quality and delivery of Internet information services to the end user. Clustering and classification have been useful in active areas of machine learning research that promise to help us cope with the problem of information overload on the Internet. BIRCH is a clustering algorithm designed  to  operate  under  the  assumption  "the  amount  of memory  available  is  limited,  whereas  the  dataset  can  be arbitrary large". The algorithm generates "a compact dataset summary" minimizing the I/O cost involved Also the effect of noise and uncertainty are major issues in Web mining. Traditionally, probability is used to measure the uncertainty in the system. The Bayesian approach provides a mathematical Bayes’ theorem to manipulate existing beliefs with some new evidence in order to form new beliefs. Bayesian inference has been seen in the literature as a robust method to deal with noise and uncertainty. Therefore, we propose a modification of UCS, using Bayesian update. This method is able to achieve higher accuracy than UCS and requires only half of the learning time to converge. The algorithm thus minimizes the outliers involved and contains enough information to apply the well known SMOKA - Smoothened k-means clustering algorithm to the set of summaries and to generate the partitions of the original dataset. We expect that the proposed method to work more quickly because it reduces the time required exploring a search space and finding a correct action for a condition.


Keywords

Algorithms: BIRCH (Balanced Iterative Reducing and Clustering Algorithm), Bayes Theorem, K-Means Algorithm, BCS.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 218

PDF Views: 4




  • A Novel Approach for Mining Web Documents Based on Bayesian Learning Classifier Systems

Abstract Views: 218  |  PDF Views: 4

Authors

M. Deepa
Kalaignar Karunaidhi Institute of Tech, India
P. Tamijeselvy
VLBJCET, India

Abstract


Web mining is a new area of data mining. Since web is one of the biggest repositories of data, analyzing and exploring regularities using data mining in web user behavior can improve system performance and enhance the quality and delivery of Internet information services to the end user. Clustering and classification have been useful in active areas of machine learning research that promise to help us cope with the problem of information overload on the Internet. BIRCH is a clustering algorithm designed  to  operate  under  the  assumption  "the  amount  of memory  available  is  limited,  whereas  the  dataset  can  be arbitrary large". The algorithm generates "a compact dataset summary" minimizing the I/O cost involved Also the effect of noise and uncertainty are major issues in Web mining. Traditionally, probability is used to measure the uncertainty in the system. The Bayesian approach provides a mathematical Bayes’ theorem to manipulate existing beliefs with some new evidence in order to form new beliefs. Bayesian inference has been seen in the literature as a robust method to deal with noise and uncertainty. Therefore, we propose a modification of UCS, using Bayesian update. This method is able to achieve higher accuracy than UCS and requires only half of the learning time to converge. The algorithm thus minimizes the outliers involved and contains enough information to apply the well known SMOKA - Smoothened k-means clustering algorithm to the set of summaries and to generate the partitions of the original dataset. We expect that the proposed method to work more quickly because it reduces the time required exploring a search space and finding a correct action for a condition.


Keywords


Algorithms: BIRCH (Balanced Iterative Reducing and Clustering Algorithm), Bayes Theorem, K-Means Algorithm, BCS.