Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Social Media Spam Detection Using Different Text Feature Selection Technique and Machine Learning


Affiliations
1 Department of Computer Science and Engineering, Sage University, India
     

   Subscribe/Renew Journal


The messaging systems and social media is popular and has essential contributions to our social and professional life. Similarly, Spam is a part of the messaging system and social media. In social media, spam is found in various places (i.e. in posts, in comments, in reviews, and in chatting). Social media Spam is aimed to influence the user’s decision, point of view, and credibility of the service or brand. Therefore, social spam detection is essential. However, using the social media data a number of contributions are available in literature, but a fewer amount of work is available for social media spam detection. In this paper, we proposed a social media spam detection technique using machine learning and text feature extraction techniques. In this context first, a review on social media spam detection techniques has been carried out. Using this review, we extract the different machine learning techniques used, techniques of text feature selection, and experimental datasets used. In this review, we found that the spam messages with the URLs are more critical and harmful. Next step, we design a theoretical model for social media spam detection, which includes text feature selection techniques (i.e. TF-IDF, POS, and Information Gain) and their combinations (POS+TF-IDF and POS+IG). These features are used with Support Vector Machine (SVM), Artificial Neural Network, and Naïve Bayes classifier for training. Experimental analysis with dataset available in Kaggle we found that hybrid features is more effective for accurate classification as compared to individual features. Additionally, we found for classification the SVM and ANN are more accurate as compared to the Bayes classifier.

Keywords

Social Media Spam, Experimental Analysis, Text Feature Selection, Classification, Social Spam Filtering
Subscription Login to verify subscription
User
Notifications
Font Size

  • G. Appel, L. Grewal, R. Hadi and A.T. Stephen, “The Future of Social Media in Marketing”, Journal of the Academy of Marketing Science, Vol. 48, pp. 79-95, 2020.
  • S.R. Srivastava, S. Dube, G. Shrivastava and K. Sharma, “Smartphone Triggered Security Challenges - Issues, Case Studies and Prevention”, Cybersecurity in Parallel and Distributed Computing, Vol. 78, pp. 1-14, 2018.
  • A. Sharma and M. Ramaiya, ““SPAM” In Social Media: A Review”, Wesleyan Journal of Research, Vol. 14, No 1, pp. 1-12, 2018.
  • S Umajancy and A.S. Thanamani, “An Analysis on Text Mining Text Retrieval and Text Extraction”, International Journal of Advanced Research in Computer and Communication Engineering, Vol. 2, No. 3, pp. 1-14, 2013.
  • M.W. Mosing, “The Ups and Downs in the History of EU-Spam-Regulations and Their Practical Impact”, Available at https://www.it-law.at/wp-content/uploads/2014/09/spamsy mposium -eu-mosing.pdf, Accessed at 2020.
  • B. Satterfield, “Ten Spam-Filtering Methods Explained”, Available at https://www.techsoupcanada.ca/en/learning_center/10_sfm_explained, Accessed at 2021.
  • A. Copestake, “Natural Language Processing”, Available at https://www.cl.cam.ac.uk/teaching/2002/NatLangProc/revised.pdf, Accessed at 2004.
  • R. Collobert and J. Weston, “Natural Language Processing (Almost) from Scratch”, Journal of Machine Learning Research, Vol. 12, pp. 2493-2537, 2011.
  • A. Ishtiaq, M.A. Islam, M.A. Iqbal, M. Aleem and U. Ahmed, “Graph Centrality Based Spam SMS Detection”, Proceedings of International Bhurban Conference on Applied Sciences and Technology, pp. 1-8, 2019.
  • A. Pinandito, R.S. Perdana, M.C. Saputra and H.M. Az-zahra, “Spam Detection Framework for Android Twitter Application Using Naive Bayes and K-Nearest Neighbor Classifiers”, Proceedings of International Conference on Software and Computer Applications, pp. 77-82, 2017.
  • G. Jain, M. Sharma and B. Agarwal, “Spam Detection in Social Media using Convolutional and Long Short Term Memory Neural Network”, Proceedings of International Conference on Annals of Mathematics and Artificial Intelligence, pp. 1-9, 2019.
  • H. Raj, Y. Weihong, S.K. Banbhrani and S.P. Dino, “LSTM Based Short Message Service (SMS) Modeling for Spam Classification”, Proceedings of International Conference on Computing Machinery, pp. 19-21, 2018.
  • I.I. Dutse, M. Liptrott and I. Korkontzelos, “Detection of Spam-Posting Accounts on Twitter”, Neurocomputing, Vol. 315, pp. 496-511, 2018.
  • K. Zainal and M.Z. Jali, “A Review of Feature Extraction Optimization in SMS Spam Messages Classification”, Proceedings of International Conference on Software Engineering, pp.158-170, 2016.
  • K. Ho, V. Liesaputra, S. Yongchareon and M. Mohaghegh, “Evaluating Social Spammer Detection Systems”, Proceedings of International Conference on Computing Machinery, pp 1-6, 2018.
  • L.A. Cornelissen, P. Schoonwinkel and R.J Barnett, “A Socio-Informatic Approach to Automated Account Classification on Social Media”, Proceedings of International Conference on Computing Machinery, pp. 19-21, 2019.
  • M. Itani, C. Roast and S.A. Khayatt, “Developing Resources for Sentiment Analysis of Informal Arabic Text in Social Media”, Procedia Computer Science, Vol. 117, pp. 129-136, 2017.
  • M.A. Janabi, E.D. Quincey and P. Andras, “Using Supervised Machine Learning Algorithms to Detect Suspicious URLs in Online Social Networks”, Proceedings of International Conference on Advances in Social Networks Analysis and Mining, pp. 1-13, 2017.
  • N.E. Mawass, P. Honeine and L. Vercouter, “Supervised Classification of Social Spammers using a Similarity-based Markov Random Field Approach”, Proceedings of International Conference on Computing Machinery, pp. 15-19, 2018.
  • P. Tehlan, R. Madaan and K.K. Bhatia, “A Spam Detection Mechanism in Social Media using Soft Computing”, Proceedings of the 13th INDIACom, pp. 1-13, 2018.
  • Q. Zhang, C. Liu, S. Zhong and K. Lei, “Spam Comments Detection with Self-Extensible Dictionary and Text-Based Features”, Proceedings of International Conference on Computers and Communications, pp. 1-13, 2017.
  • S. Dhawan and Simran, “An Enhanced Mechanism of Spam and Category Detection using Neuro-SVM”, Procedia Computer Science, Vol. 132, pp. 429-436, 2018.
  • S. Shehnepoor, M. Salehi, R. Farahbakhsh and N. Crespi, “NetSpam: a Network-based Spam Detection Framework for Reviews in Online Social Media”, Proceedings of International Conference on Computers and Communications, pp. 1-15, 2017.
  • S. Tingxuan and R.Y.K. Lau, “Collective Classification for Social Opinion Spam Detection”, Proceedings of International Conference on Computers and Communications, pp. 19-21, 2019
  • S. Chancellor and Y. Kalantidis, “Multimodal Classification of Moderated Online Pro-Eating Disorder Content”, Proceedings of International Conference on Computers and Technology, pp. 6-11, 2017.
  • T. Green and F. Spezzano, “Spam Users Identification in Wikipedia via Editing Behavior”, Proceedings of 11th International AAAI Conference on Web and Social Media, pp. 1-13, 2017.
  • T. Wu, S. Liu, J. Zhang and Y. Xiang, “Twitter Spam Detection based on Deep Learning”, Proceedings of International Conference on Computers and Communications, pp. 111-123, 2017.
  • Y.A. Amrani, M. Lazaar and K.E. Elkadiri, “Sentiment Analysis using Supervised Classification Algorithms”, Proceedings of International Conference on Computing Machinery, pp. 321-335, 2017.
  • Twitter Spam, Available at https://www.kaggle.com/c/twitter-spam/overview, Accessed at 2021.
  • P. Bafna, D. Pramod, A. Vaidya, “Document Clustering: TF-IDF Approach”, Proceedings of International Conference on Electrical, Electronics, and Optimization Techniques, pp. 1-14, 2016.
  • S.M. Mohammad, S. Kiritchenko and X. Zhu, “NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets”, Proceedings of International Conference on Semantic Evaluation Exercises, pp. 321-327, 2013.
  • T. Kenter and M. De Rijke, “Short Text Similarity with Word Embeddings”, Proceedings of International Conference on Electrical and Electronics, pp. 19-23, 2015.
  • S. Liu and H. Shen, “Adaptive Cotraining SVM for Sentiment Classification on Tweets”, Proceedings of International Conference on Information and Knowledge Management, pp. 2079-2088, 2013.
  • C. Wan and A.A. Freitas, “An Empirical Evaluation of Hierarchical Feature Selection Methods for Classification in Bioinformatics Datasets with Gene Ontology-Based Features”, Artificial Intelligence Review, Vol. 78, pp. 1-13, 2017.
  • A. Ghosh, “Comparative Study of Financial Time Series Prediction by Artificial Neural Network with Gradient Descent Learning”, International Journal Of Scientific and Engineering Research, Vol. 3, No. 1, pp. 1-14, 2012.

Abstract Views: 61

PDF Views: 2




  • Social Media Spam Detection Using Different Text Feature Selection Technique and Machine Learning

Abstract Views: 61  |  PDF Views: 2

Authors

Anubha Sharma
Department of Computer Science and Engineering, Sage University, India
Manoj Ramaiya
Department of Computer Science and Engineering, Sage University, India

Abstract


The messaging systems and social media is popular and has essential contributions to our social and professional life. Similarly, Spam is a part of the messaging system and social media. In social media, spam is found in various places (i.e. in posts, in comments, in reviews, and in chatting). Social media Spam is aimed to influence the user’s decision, point of view, and credibility of the service or brand. Therefore, social spam detection is essential. However, using the social media data a number of contributions are available in literature, but a fewer amount of work is available for social media spam detection. In this paper, we proposed a social media spam detection technique using machine learning and text feature extraction techniques. In this context first, a review on social media spam detection techniques has been carried out. Using this review, we extract the different machine learning techniques used, techniques of text feature selection, and experimental datasets used. In this review, we found that the spam messages with the URLs are more critical and harmful. Next step, we design a theoretical model for social media spam detection, which includes text feature selection techniques (i.e. TF-IDF, POS, and Information Gain) and their combinations (POS+TF-IDF and POS+IG). These features are used with Support Vector Machine (SVM), Artificial Neural Network, and Naïve Bayes classifier for training. Experimental analysis with dataset available in Kaggle we found that hybrid features is more effective for accurate classification as compared to individual features. Additionally, we found for classification the SVM and ANN are more accurate as compared to the Bayes classifier.

Keywords


Social Media Spam, Experimental Analysis, Text Feature Selection, Classification, Social Spam Filtering

References