Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Synonym Based Document Clustering Using Thesaurus


Affiliations
1 Theivanai Ammal Women's College, Villupuram, Tamil Nadu, India
2 Department of Computer Science and Applications, SCSVMV University, Kanchipuram, Tamil Nadu, India
     

   Subscribe/Renew Journal


A Synonym based document clustering approach is proposed to cluster more document related to the user query. The synonym of the word is got from online thesaurus. Document clustering is one of the concepts in data mining. Many techniques are used for clustering. In the existing synonyms of the word and their synonyms stored in the database by the user. User should store all the words one by one so it takes more time. Sometimes all the words could not be stored in the database. If the word has more than one synonym it will be complex. In this proposed synonyms are got from the thesaurus.com (online library). In this method both the user entered keyword and their synonyms also clustered. Tf-idf method is used for ranking the clustered documents by using c#.net code. So it gives more relevant and accurate results of the user query. For experimental purpose we have used some text files. It gives better performance than the existing method and there is no need to maintain the database.

Keywords

Document Clustering, Synonym Based Search, TF-Idf, Thesaurus.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 271

PDF Views: 3




  • Synonym Based Document Clustering Using Thesaurus

Abstract Views: 271  |  PDF Views: 3

Authors

A. Rajeswari
Theivanai Ammal Women's College, Villupuram, Tamil Nadu, India
M. Kannan
Department of Computer Science and Applications, SCSVMV University, Kanchipuram, Tamil Nadu, India

Abstract


A Synonym based document clustering approach is proposed to cluster more document related to the user query. The synonym of the word is got from online thesaurus. Document clustering is one of the concepts in data mining. Many techniques are used for clustering. In the existing synonyms of the word and their synonyms stored in the database by the user. User should store all the words one by one so it takes more time. Sometimes all the words could not be stored in the database. If the word has more than one synonym it will be complex. In this proposed synonyms are got from the thesaurus.com (online library). In this method both the user entered keyword and their synonyms also clustered. Tf-idf method is used for ranking the clustered documents by using c#.net code. So it gives more relevant and accurate results of the user query. For experimental purpose we have used some text files. It gives better performance than the existing method and there is no need to maintain the database.

Keywords


Document Clustering, Synonym Based Search, TF-Idf, Thesaurus.