Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

A New Content Based Text Clustering Using Spherical Gaussian EM Algorithm


Affiliations
1 Department of Computer Science, P.S.G.R. Krishnammal College for Women, Coimbatore, India
2 P.S.G.R Krishnammal College for Women, Coimbatore, India
     

   Subscribe/Renew Journal


In this paper extracting the relations between verbs and their arguments in the same sentence has the potential for analyzing terms within a sentence. A novel concept-based mining model is proposed. This paper model captures the semantic structure of each term within a sentence and document rather than the frequency of the term within a document only. There are four models are present in this concept based mining model, they are sentence-based concept analysis, Document-based concept analysis, Corpus-based and then concept based similarity measures are used. Here clustering is used to attained the better results of the text mining. Spherical Gaussian EM algorithm clustering techniques are used. Large sets of experiments using the proposed concept-based mining model on different data sets in text clustering are conducted. Effectiveness of concept matching in determining an accurate measure of the similarity between documents and extensive sets of experiments using the concept-based term analysis and similarity measure are conducted. Experimental results are taken using MATLAB. Three types of datasets are used in this paper they are Reuters, TDT and 20 News Group. Performance evaluation are used for the text clustering are F-Measure and Execution Time.

Keywords

Data Preprocessing, Web Usage Mining, Path Completion Algorithm, Data Cleaning, User Session Identification, Modified Expectation Maximation.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 207

PDF Views: 3




  • A New Content Based Text Clustering Using Spherical Gaussian EM Algorithm

Abstract Views: 207  |  PDF Views: 3

Authors

S. C. Punitha
Department of Computer Science, P.S.G.R. Krishnammal College for Women, Coimbatore, India
R. Jayasree
P.S.G.R Krishnammal College for Women, Coimbatore, India

Abstract


In this paper extracting the relations between verbs and their arguments in the same sentence has the potential for analyzing terms within a sentence. A novel concept-based mining model is proposed. This paper model captures the semantic structure of each term within a sentence and document rather than the frequency of the term within a document only. There are four models are present in this concept based mining model, they are sentence-based concept analysis, Document-based concept analysis, Corpus-based and then concept based similarity measures are used. Here clustering is used to attained the better results of the text mining. Spherical Gaussian EM algorithm clustering techniques are used. Large sets of experiments using the proposed concept-based mining model on different data sets in text clustering are conducted. Effectiveness of concept matching in determining an accurate measure of the similarity between documents and extensive sets of experiments using the concept-based term analysis and similarity measure are conducted. Experimental results are taken using MATLAB. Three types of datasets are used in this paper they are Reuters, TDT and 20 News Group. Performance evaluation are used for the text clustering are F-Measure and Execution Time.

Keywords


Data Preprocessing, Web Usage Mining, Path Completion Algorithm, Data Cleaning, User Session Identification, Modified Expectation Maximation.