Large Document Set Clustering:An Integrated Approach

Krishna Kumar Mohbey; G. S. Thakur

Large Document Set Clustering:An Integrated Approach

Krishna Kumar Mohbey , G. S. Thakur

Affiliations
1 National Institute of Technology, Bhopal, India

Subscribe/Renew Journal

Abstract
References
Article Metrics
Refbacks

Document clustering is an important mining task used by the different peoples for different kind of purposes. It is generally used to find the similar document from the large amount of documents. The document set may be the collection of blogs, website access patterns, or any transaction files. By the document clustering one can find out the similar kind of habits of different peoples, which can play large role in future trend analysis and taking some decisions. Most of the clustering methods uses distance calculation for similarity measure. They scans document multiple times for knowing class and then prepare cluster. If the documents are large then these methods takes more time for clustering. We propose an advanced environment for document clustering, in which only one time documents are scan and immediately assign into the appropriate cluster. Experiments are conducted with the 20 news group datasets by the MATLAB software. Experimental results show the effectiveness of the proposed environment for large document sets.