A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Hassan, Syed Imtiaz
- Enhancing the Classification Accuracy of Noisy Dataset by Fusing Correlation Based Feature Selection with K-Nearest Neighbour
Authors
1 Department of Computer Science & Engineering, Jamia Hamdard, New Delhi, IN
Source
Oriental Journal of Computer Science and Technology, Vol 10, No 2 (2017), Pagination: 282-290Abstract
The performance of data mining and machine learning tasks can be significantly degraded due to the presence of noisy, irrelevant and high dimensional data containing large number of features. A large amount of real world data consist of noise or missing values. While collecting data, there may be many irrelevant features that are collected by the storage repositories. These redundant and irrelevant feature values distorts the classification principle and simultaneously increases calculations overhead and decreases the prediction ability of the classifier. The high-dimensionality of such datasets possesses major bottleneck in the field of data mining, statistics, machine learning. Among several methods of dimensionality reduction, attribute or feature selection technique is often used in dimensionality reduction. Since the k-NN algorithm is sensitive to irrelevant attributes therefore its performance degrades significantly when a dataset contains missing values or noisy data. However, this weakness of the k-NN algorithm can be minimized when combined with the other feature selection techniques. In this research we combine the Correlation based Feature Selection (CFS) with k-Nearest Neighbour (k-NN) Classification algorithm to find better result in classification when the dataset contains missing values or noisy data. The reduced attribute set decreases the time required for classification. The research shows that when dimensionality reduction is done using CFS and classified with k-NN algorithm, dataset with nil or very less noise may have negative impact in the classification accuracy, when compared with classification accuracy of k-NN algorithm alone. When additional noise is introduced to these datasets, the performance of k-NN degrades significantly. When these noisy datasets are classified using CFS and k-NN together, the percentage in classification accuracy is improved.
Keywords
k-Nearest Neighbour, Correlation Based Feature Selection, Attribute Selection, Missing Values, Dimensionality Reduction.References
- Syed Imtiyaz Hassan, 2017, “Designing a flexible system for automatic detection of categorical student sentiment polarity using machine learning”, International Journal of u- and e- Service, Science and Technology, vol. 10, issue.3, Mar 2017, ISSN: 2005-4246.
- P. Langley and S. Sage, 1994, Oblivious decision trees and abstract cases, In Working Notes of the AAAI-94 Workshop on Case-Based Reasoning, Seattle, W.A, AAAI Press.
- Jiawei Han, Micheline Kamber and Jian Pei, 2012, Data Mining Concept and Techniques.3rd ed. Morgan Kaufmann Publishers,201,.p. 99-105.
- D.L. Donoho, 2011, High-dimensional data analysis: The curses and blessings of dimensionality. Lecture delivered at the “Mathematical Challenges of the 21st Century” conference of The American Math. Society, Los Angeles, August 6-11. Available at http://statweb.stanford.edu/~donoho/Lectures/AMS2000/ MathChallengeSlides2*2.pdf.
- L. Breiman, Random forests, 2001, Technical report, Department of Statistics, University of California.
- Batista G, Monard MC, 2003, An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17:519–533
- Breiman L, 1996, Bagging predictors. Mach Learn 24:123–140
- Yu L, Liu H, 2004, Efficient feature selection via analysis of relevance and redundancy. JMLR 5:1205–1224.
- G G.Kesavaraj , Dr.S.Sukumaran, 2013, A Study On Classification Techniques in Data Mining, Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), DOI :10.1109/ICCCNT.2013.6726842
- S.B. Kotsiantis, I. D. Zaharakis, P. E . Pintelas, 2006, Machine learning: A review of classification and combining techniques. DOI: 10.1007/s10462-007-9052.
- Mark A. Hall, 1999, Correlation-based Feature Selection for Machine Learning, Ph. D. Dissertation, The University of Waikato, New Zealand.
- Syed Imtiyaz Hassan, 2016, “Extracting the sentiment score of customer review from unstructured big data using Map Reduce algorithm”, International Journal of Database Theory and Application, vol. 9, issue 12, Dec 2016, pp. 289-298, DOI:10.14257/ijdta.2016.9.12.26, ISSN: 2005-4270.
- UCI Machine Learning Repository, Available at http://mlr.cs.umass.edu/ml/datasets.html, accessed Sep 16.
- Weka Documentation, Available at www.cs.waikato.ac.nz, accessed Sep 16.
- G Guyon and Elissee, 2003, Greedy stepwise search : An introduction to variable and feature selection. Journal of Machine Learning research.
- Assessment of Accuracy Enhancement of Back Propagation Algorithm by Training the Model using Deep Learning
Authors
1 Department of Computer Science & Engineering, Jamia Hamdard, New Delhi, IN
Source
Oriental Journal of Computer Science and Technology, Vol 10, No 2 (2017), Pagination: 298-304Abstract
Deep learning is a branch of machine learning which is recently gaining a lot of attention due to its efficiency in solving a number of AI problems. The aim of this research is to assess the accuracy enhancement by using deep learning in back propagation algorithm. For this purpose, two techniques has been used. In the first technique, simple back propagation algorithm is used and the designed model is tested for accuracy. In the second technique, the model is first trained using deep learning via deep belief nets to make it learn and improve its parameters values and then back propagation is used over it. The advantage of softmax function is used in both the methods. Both the methods have been tested over images of handwritten digits and accuracy is then calculated. It has been observed that there is a significant increase in the accuracy of the model if we apply deep learning for training purpose.
Keywords
Machine Learning, Deep Learning, Deep Belief Nets, Back Propagation, Restricted Boltzmann Machines, Artificial Neural Networks, Softmax Function.References
- Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, “Thumbs up? sentiment classification using machine learning techniques”, in Proc. of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10. ACM, Stroudsburg, PA, USA, pp. 79-86. DOI=http://dx.doi.org/10.3115/1118693.1118704.
- Hu, Minqing and Bing Liu. Mining and summarizing customer reviews. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004).
- Syed Imtiyaz Hassan, “Designing a flexible system for automatic detection of categorical student sentiment polarity using machine learning”, International Journal of u- and e- Service, Science and Technology, vol. 10, issue.3, Mar 2017, ISSN: 2005-4246. (to be published in Mar 31, 2017)
- Turney, P. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL- 2002).
- MNIST handwritten digit database, YannLeCun, Corinna Cortes and Chris Burges", Yann.lecun.com, 2016. [Online]. Available: http://yann.lecun.com/exdb/mnist/. [Accessed: 01- Dec- 2016].
- G. Hinton and Y. Teh, A fast learning algorithm for deep belief nets, 1st ed. Toronto, 2006, pp. 1-5,8-11.
- A. Chris Nicholson, "A Beginner's Tutorial for Restricted Boltzmannn Machines - Deeplearning4j: Open-source, Distributed Deep Learning for the JVM", Deeplearning4j.org, 2016. [Online]. Available: https://deeplearning4j.org/restrictedBoltzmannnmachine.html. [Accessed: 30- Nov- 2016].
- Q. V. Le, A Tutorial on Deep Learning, 1st ed. Mountain View, 2015, pp. 10-15.
- O.Matan ,Reading Handwritten Digits : A Zip Code Recognition System, 1st ed. Holmdel, 2016, pp. 5-15.
- A Practical Guide to Training Restricted Boltzmannn Machines, 1st ed. Toronto, 2016, pp. 1-7.
- J. Han and M. Kamber, Data Mining Concepts amd Techniques, 3rd ed. MA: Elsevier, 2012, pp. 398-407 ,327-332.
- "Classification. Classification: predicts categorical class labels classifies data (constructs a model) based on the training set and the values (class. - ppt download", Slideplayer.com, 2016. [Online]. Available: http://slideplayer.com/slide/5243492/. [Accessed: 01- Dec- 2016].
- M. Mazue, "A Step by Step Backpropagation Example", Matt Mazur, 2016. [Online]. Available: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example. [Accessed: 30- Nov- 2016].
- "Training an Artificial Neural Network - Intro", solver, 2016. [Online]. Available: http://www.solver.com/training-artificial-neural-network-intro. [Accessed: 30- Nov- 2016].
- D. Yuret, "Softmax Classification — Knet.jl 0.7.2 documentation", Knet.readthedocs.io, 2016. [Online]. Available: http://knet.readthedocs.io/en/latest/softmax.html. [Accessed: 01- Dec- 2016].
- S. Raschka, "What-is-the-intuition-behind-SoftMax-function", Quora, 2014. [Online]. Available: https://www.quora.com/What-is-the-intuition-behind-SoftMax-function. [Accessed: 01- Dec- 2016].
- K. ZHANG and X. CHEN, Large-Scale Deep Belief Nets With MapReduce, 1st ed. Detroit: IEEE, 2014, pp. 1-5.
- Syed Imtiyaz Hassan, “Extracting the sentiment score of customer review from unstructured big data using Map Reduce algorithm”, International Journal of Database Theory and Application, vol. 9, issue 12, Dec 2016, pp. 289-298doi:10.14257/ijdta.2016.9.12.26, ISSN: 2005-4270.