Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Large Document Set Clustering:An Integrated Approach


Affiliations
1 National Institute of Technology, Bhopal, India
     

   Subscribe/Renew Journal


Document clustering is an important mining task used by the different peoples for different kind of purposes. It is generally used to find the similar document from the large amount of documents. The document set may be the collection of blogs, website access patterns, or any transaction files. By the document clustering one can find out the similar kind of habits of different peoples, which can play large role in future trend analysis and taking some decisions. Most of the clustering methods uses distance calculation for similarity measure. They scans document multiple times for knowing class and then prepare cluster. If the documents are large then these methods takes more time for clustering. We propose an advanced environment for document clustering, in which only one time documents are scan and immediately assign into the appropriate cluster. Experiments are conducted with the 20 news group datasets by the MATLAB software. Experimental results show the effectiveness of the proposed environment for large document sets.

Keywords

Document Clustering, Similarity Measurements, Dendogram, Term Extraction.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 141

PDF Views: 2




  • Large Document Set Clustering:An Integrated Approach

Abstract Views: 141  |  PDF Views: 2

Authors

Krishna Kumar Mohbey
National Institute of Technology, Bhopal, India
G. S. Thakur
National Institute of Technology, Bhopal, India

Abstract


Document clustering is an important mining task used by the different peoples for different kind of purposes. It is generally used to find the similar document from the large amount of documents. The document set may be the collection of blogs, website access patterns, or any transaction files. By the document clustering one can find out the similar kind of habits of different peoples, which can play large role in future trend analysis and taking some decisions. Most of the clustering methods uses distance calculation for similarity measure. They scans document multiple times for knowing class and then prepare cluster. If the documents are large then these methods takes more time for clustering. We propose an advanced environment for document clustering, in which only one time documents are scan and immediately assign into the appropriate cluster. Experiments are conducted with the 20 news group datasets by the MATLAB software. Experimental results show the effectiveness of the proposed environment for large document sets.

Keywords


Document Clustering, Similarity Measurements, Dendogram, Term Extraction.