Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Towards Semantically Sensitive Text Clustering: A Feature Space Modeling Technology Based on Dimension Extension


Affiliations
1 Department of Computer Science, GATE College, Tirupati, Andhra Pradesh, India
     

   Subscribe/Renew Journal


Content bunching is a large use of knowledge mining. It’s concerned about gathering related content archives together. Proper now paper, a number of models are worked to bunch capstone venture archives using three grouping systems: okay-implies, ok-implies rapid, and k-medoids. Our dataset is acquired from the library of the University Pc and Information Sciences, King Saud tuition, Riyadh. Three closeness measure are tried: Cosine likeness, Jacquard similitude, and Correlation Coefficient. The nature of the got models is assessed and checked out. The results display that the great execution is comprehensive utilizing k-implies and okay-medoids joined with cosine similitude. We watch style in the nature of bunching based on the assessment measure utilized. Additionally, as the estimation of okay builds, the character of the next crew improves. At long last, we find the classifications of commencement ventures provided in the information technological know-how division for female understudies.

Keywords

Clustering, Cosine Similarity, Data Mining, K-Means, K-Medoids, Text Mining.
Subscription Login to verify subscription
User
Notifications
Font Size


  • J. Han, and M. Kamber, “Data mining: Concepts and techniques,” In Data Management Systems, 3rd ed., Morgan Kaufmann, 2011. ISBN 978-0-12-381479-1
  • C. C. Aggarwal, and C. Zhai, “A survey of text clustering algorithms,” In C. C. Aggarwal, and C. Zhai, Mining Text Data, pp. 77-128, Springer US, 2012.
  • C. Luo, Y. Li, and S. M. Chung, “Text document clustering based on neighbors,” Data and Knowledge Engineering, vol. 68, no. 11, pp. 1271-1288, 2009.
  • J. A. Hartigan, Clustering Algorithms. New York, NY, USA: John Wiley & Sons, Inc., 99th ed., 1975. ISBN 978-0-471-35645-5. C. Elkan, Using the Triangle Inequality to Accelerate k-Means. In T. Fawcett, and N. Mishra, Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), August 21-24, 2003, Washington, DC, USA, AAAI Press, pp. 147-153, 2003.
  • L. Kaufman, and P. J. Rousseeuw, “Clustering by means of medoids,” In Y. Dodge, and N. Holland, Statistical Data Analysis Based on the L1-Norm and Related Methods, pp. 405-416, Springer US, 1987.
  • D. C. Blair, Information Retrieval, 2nd ed., C. J. Van Rijsbergen, London: Butterworths, p. 208, 1979. Journal of the American Society for Information Science, vol. 30, no. 6, pp. 374-375, 1979.
  • P. Bide, and R. Shedge, “Improved document clustering using K-means algorithm,” In 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1-5, 2015.
  • K. Lang, 20 Newsgroups Data Set, 2008. (Accessed 18-12-2015). [Online]. Available: http://www.ai.mit.edu/people/jrennie/20Newsgroups/
  • R. Mishra, K. Saini, and S. Bagri, “Text document clustering on the basis of inter passage approach by using K-means,” In 2015 International Conference on Computing, Communication Automation (ICCCA), pp. 110-113, 2015.
  • G. Salton, and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing and Management, vol. 24, no. 5, pp. 513-523, 1988.
  • A. Esuli, and F. Sebastiani, “SENTIWORDNET: A publicly available lexical resource for opinion mining,” In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC06), pp. 417-422, 2006.
  • C. C. Aggarwal, and C. Zhai, Mining Text Data, Springer Science & Business Media, 2012. ISBN 978-1-4614-3223-4
  • T. Verma, Renu, and D. Gaur, “Tokenization and filtering process in rapidminer,” International Journal of Applied Information Systems, vol. 7, no. 2, pp. 16-18, 2014.
  • Home - RapidMiner Documentation, 2015. (Accessed 18-12-2015). [Online]. Available: http://docs.rapidminer.com/
  • D. L. Davies, and D. W. Bouldin, “A cluster separation measure,” IEEE Transaction-s on Pattern Analysis and Machine Intelligence (PAMI), vol. 1, no. 2, pp. 224-227, 1979.

Abstract Views: 138

PDF Views: 0




  • Towards Semantically Sensitive Text Clustering: A Feature Space Modeling Technology Based on Dimension Extension

Abstract Views: 138  |  PDF Views: 0

Authors

Chitti Babukalapati
Department of Computer Science, GATE College, Tirupati, Andhra Pradesh, India

Abstract


Content bunching is a large use of knowledge mining. It’s concerned about gathering related content archives together. Proper now paper, a number of models are worked to bunch capstone venture archives using three grouping systems: okay-implies, ok-implies rapid, and k-medoids. Our dataset is acquired from the library of the University Pc and Information Sciences, King Saud tuition, Riyadh. Three closeness measure are tried: Cosine likeness, Jacquard similitude, and Correlation Coefficient. The nature of the got models is assessed and checked out. The results display that the great execution is comprehensive utilizing k-implies and okay-medoids joined with cosine similitude. We watch style in the nature of bunching based on the assessment measure utilized. Additionally, as the estimation of okay builds, the character of the next crew improves. At long last, we find the classifications of commencement ventures provided in the information technological know-how division for female understudies.

Keywords


Clustering, Cosine Similarity, Data Mining, K-Means, K-Medoids, Text Mining.

References