The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).

If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.

Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.

Fullscreen Fullscreen Off


Background/Objectives: In this paper we have analyzed various issues with clustering and text mining. The collected documents are preprocessed and grouped using our proposed new algorithm based on position method. We proved our proposed color based constraint clustering algorithm out performs than K-Means and SOM algorithms in terms of time and reliability factors. Methods/Statistical Analysis: We identified the problem after analyzing the existing works with the help of articles from reputed journal papers and national and International level conferences. We proposed the new methodology for document grouping process, and color based constraint clustering process. Clustering can be considered as the most important semi-supervised learning problem which deals with finding a structure in a collection of unlabelled data. In this work the collected documents are preprocessed by stop word removal and stemming process and then the words are grouped according to their similarity using color code constraints. Performances of SOM and Kmeans, and color based constraint algorithms are analyzed for any kind of text document collections. Findings: In this work our proposed color based constraint (CBC) algorithm, SOM and K-Means algorithms performances are compared against time based frequency and reliability of retrieved documents. Here, the time needed to process the number of documents is analyzed. Reliability of retrieved documents can be made by using the number documents and the frequency measurement. We proved our proposed color based constraint clustering algorithm out performs than K-Means, and SOM algorithms in terms of time and reliability. Application/Improvements: Our work is useful for efficient information retrieval process. In future this work can be extended to maximize the grouping of words with minimum latency and one can also extend this work to develop an algorithm for maximize the grouping(clustering) of words in a document with color based constraints to increase the clustering performance for efficient text mining.

Keywords

Color Based Constraint, Clustering, Information Retrieval, Semi_Supervised Clustering Technique, Text Mining
User