Open Access Open Access  Restricted Access Subscription Access

Frame Work for Semi-Supervised Clustering based on Color Constraints to Enhance Text Mining for Efficient Information Retrieval


Affiliations
1 Department of Computer Science, Sri Meenakshi Govt. Arts College for Women (A), Madurai – 625002, Tamil Nadu, India
2 PG and Research Department of Computer Science, Thiru A. Govindasamy Govt. Arts College, Tindivanam – 604002, Tamil Nadu, India
 

Background/Objectives: In this paper we have analyzed various issues with clustering and text mining. The collected documents are preprocessed and grouped using our proposed new algorithm based on position method. We proved our proposed color based constraint clustering algorithm out performs than K-Means and SOM algorithms in terms of time and reliability factors. Methods/Statistical Analysis: We identified the problem after analyzing the existing works with the help of articles from reputed journal papers and national and International level conferences. We proposed the new methodology for document grouping process, and color based constraint clustering process. Clustering can be considered as the most important semi-supervised learning problem which deals with finding a structure in a collection of unlabelled data. In this work the collected documents are preprocessed by stop word removal and stemming process and then the words are grouped according to their similarity using color code constraints. Performances of SOM and Kmeans, and color based constraint algorithms are analyzed for any kind of text document collections. Findings: In this work our proposed color based constraint (CBC) algorithm, SOM and K-Means algorithms performances are compared against time based frequency and reliability of retrieved documents. Here, the time needed to process the number of documents is analyzed. Reliability of retrieved documents can be made by using the number documents and the frequency measurement. We proved our proposed color based constraint clustering algorithm out performs than K-Means, and SOM algorithms in terms of time and reliability. Application/Improvements: Our work is useful for efficient information retrieval process. In future this work can be extended to maximize the grouping of words with minimum latency and one can also extend this work to develop an algorithm for maximize the grouping(clustering) of words in a document with color based constraints to increase the clustering performance for efficient text mining.

Keywords

Color Based Constraint, Clustering, Information Retrieval, Semi_Supervised Clustering Technique, Text Mining
User

Abstract Views: 152

PDF Views: 0




  • Frame Work for Semi-Supervised Clustering based on Color Constraints to Enhance Text Mining for Efficient Information Retrieval

Abstract Views: 152  |  PDF Views: 0

Authors

S. Suguna
Department of Computer Science, Sri Meenakshi Govt. Arts College for Women (A), Madurai – 625002, Tamil Nadu, India
V. Sundaravadivelu
PG and Research Department of Computer Science, Thiru A. Govindasamy Govt. Arts College, Tindivanam – 604002, Tamil Nadu, India
B. Gomathi
Department of Computer Science, Sri Meenakshi Govt. Arts College for Women (A), Madurai – 625002, Tamil Nadu, India

Abstract


Background/Objectives: In this paper we have analyzed various issues with clustering and text mining. The collected documents are preprocessed and grouped using our proposed new algorithm based on position method. We proved our proposed color based constraint clustering algorithm out performs than K-Means and SOM algorithms in terms of time and reliability factors. Methods/Statistical Analysis: We identified the problem after analyzing the existing works with the help of articles from reputed journal papers and national and International level conferences. We proposed the new methodology for document grouping process, and color based constraint clustering process. Clustering can be considered as the most important semi-supervised learning problem which deals with finding a structure in a collection of unlabelled data. In this work the collected documents are preprocessed by stop word removal and stemming process and then the words are grouped according to their similarity using color code constraints. Performances of SOM and Kmeans, and color based constraint algorithms are analyzed for any kind of text document collections. Findings: In this work our proposed color based constraint (CBC) algorithm, SOM and K-Means algorithms performances are compared against time based frequency and reliability of retrieved documents. Here, the time needed to process the number of documents is analyzed. Reliability of retrieved documents can be made by using the number documents and the frequency measurement. We proved our proposed color based constraint clustering algorithm out performs than K-Means, and SOM algorithms in terms of time and reliability. Application/Improvements: Our work is useful for efficient information retrieval process. In future this work can be extended to maximize the grouping of words with minimum latency and one can also extend this work to develop an algorithm for maximize the grouping(clustering) of words in a document with color based constraints to increase the clustering performance for efficient text mining.

Keywords


Color Based Constraint, Clustering, Information Retrieval, Semi_Supervised Clustering Technique, Text Mining



DOI: https://doi.org/10.17485/ijst%2F2015%2Fv8i28%2F121428