Open Access Open Access  Restricted Access Subscription Access

Hate Speech Detection in Social Media Using the Ensemble Learning Technique


Affiliations
1 Department of Computer Science, New Mexico Tech, Socorro, New Mexico, United States
 

Our lives have become intertwined with social media platforms such as Twitter, Facebook, LinkedIn, etc. They provide us with a platform to express our opinions and share our thoughts with the world. However, some individuals abuse the freedom of expression afforded to them by these platforms and utilize them to disseminate content that is derogatory and promotes hate speech. This has become a significant problem today, and detecting such content is a challenging task. In this research paper, we propose a solution for hate speech detection in social media using natural language processing techniques. We use a publicly available dataset provided by CrowdFlower and perform text pre-processing to clean the dataset. We then conduct feature engineering to extract key features that can be used in machine learning classification algorithms. We compare the performance of various algorithms about each feature set and conduct an in-depth analysis of the results obtained.

Keywords

Hate Speech Detection, Social Media, Natural Language Processing, Machine Learning, Artificial Intelligence, Sentiment Analysis.
User
Notifications
Font Size

  • Anna Schmidt and Michael Wiegand. 2017. A Survey on Hate Speech Detection using Natural Language Processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pages 1–10, Valencia, Spain. Association for Computational Linguistics.
  • A. Alrehili, "Automatic Hate Speech Detection on Social Media: A Brief Survey," 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), Abu Dhabi, United Arab Emirates, 2019, pp. 1-6, doi: 10.1109/AICCSA47632.2019.9035228.
  • Areej et al., "Detection Of Hate Speech In Social Networks: A Survey Of Multilingual Corpus "
  • Florio, Komal, Valerio Basile, Marco Polignano, Pierpaolo Basile, and Viviana Patti. 2020. "Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social Media" Applied Sciences 10, no. 12: 4180. https://doi.org/10.3390/app10124180
  • Poletto, F., Basile, V., Sanguinetti, M. et al. Resources and benchmark corpora for hate speech detection: a systematic review. Lang Resources & Evaluation 55, 477–523 (2021). https://doi.org/10.1007/s10579-020-09502-8
  • Sindhu et al., "Automatic Hate Speech Detection using Machine Learning: A Comparative Study"
  • Ariadna et al., "Racism, Hate Speech, and Social Media: A Systematic Review and Critique"
  • Paula Fortuna and Sérgio Nunes. 2018. A Survey on Automatic Detection of Hate Speech in Text. ACM Comput. Surv. 51, 4, Article 85 (July 2019), 30 pages. https://doi.org/10.1145/3232676
  • Naganna Chetty, Sreejith Alathur, Hate speech review in the context of online social networks, Aggression and Violent Behavior, Volume 40, 2018, Pages 108-118, ISSN 1359-1789, https://doi.org/10.1016/j.avb.2018.05.003.
  • O. Istaiteh, R. Al-Omoush and S. Tedmori, "Racist and Sexist Hate Speech Detection: Literature Review," 2020 International Conference on Intelligent Data Science Technologies and Applications (IDSTA), Valencia, Spain, 2020, pp. 95-99, doi: 10.1109/IDSTA50958.2020.9264052.
  • H. M. S. T. Sandaruwan, S. A. S. Lorensuhewa and M. A. L. Kalyani, "Sinhala Hate Speech Detection in Social Media using Text Mining and Machine learning," 2019 19th International Conference on Advances in ICT for Emerging Regions (ICTer), Colombo, Sri Lanka, 2019, pp. 1-8, doi: 10.1109/ICTer48817.2019.9023655.
  • M. U. S. Khan, A. Abbas, A. Rehman and R. Nawaz, "HateClassify: A Service Framework for Hate Speech Identification on Social Media," in IEEE Internet Computing, vol. 25, no. 1, pp. 40-49, 1 Jan.-Feb. 2021, doi: 10.1109/MIC.2020.3037034.
  • K. A. Qureshi and M. Sabih, "Un-Compromised Credibility: Social Media Based Multi-Class Hate Speech Classification for Text," in IEEE Access, vol. 9, pp. 109465-109477, 2021, doi: 10.1109/ACCESS.2021.3101977.
  • P. William, R. Gade, R. e. Chaudhari, A. B. Pawar and M. A. Jawale, "Machine Learning based Automatic Hate Speech Recognition System," 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS), Erode, India, 2022, pp. 315-318, doi: 10.1109/ICSCDS53736.2022.9760959.
  • R. Martins, M. Gomes, J. J. Almeida, P. Novais and P. Henriques, "Hate Speech Classification in Social Media Using Emotional Analysis," 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), Sao Paulo, Brazil, 2018, pp. 61-66, doi: 10.1109/BRACIS.2018.00019.
  • A. M. Ishmam and S. Sharmin, "Hateful Speech Detection in Public Facebook Pages for the Bengali Language," 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, 2019, pp. 555-560, doi: 10.1109/ICMLA.2019.00104.
  • N. A. Setyadi, M. Nasrun and C. Setianingsih, "Text Analysis for Hate Speech Detection Using Backpropagation Neural Network," 2018 International Conference on Control, Electronics, Renewable Energy and Communications (ICCEREC), Bandung, Indonesia, 2018, pp. 159-165, doi: 10.1109/ICCEREC.2018.8712109.
  • P. S. Br Ginting, B. Irawan and C. Setianingsih, "Hate Speech Detection on Twitter Using Multinomial Logistic Regression Classification Method," 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), Bali, Indonesia, 2019, pp. 105-111, doi: 10.1109/IoTaIS47347.2019.8980379.
  • N. D. T. Ruwandika and A. R. Weerasinghe, "Identification of Hate Speech in Social Media," 2018 18th International Conference on Advances in ICT for Emerging Regions (ICTer), Colombo, Sri Lanka, 2018, pp. 273-278, doi: 10.1109/ICTER.2018.8615517.
  • F. M. Plaza-Del-Arco, M. D. Molina-González, L. A. Ureña-López and M. T. Martín-Valdivia, "A Multi-Task Learning Approach to Hate Speech Detection Leveraging Sentiment Analysis," in IEEE Access, vol. 9, pp. 112478-112489, 2021, doi: 10.1109/ACCESS.2021.3103697.
  • Alahy, Q.E., Chowdhury, M.NUR., Soliman, H., Chaity, M.S., Haque, A. (2020). Android Malware Detection in Large Dataset: Smart Approach. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Advances in Information and Communication. FICC 2020. Advances in Intelligent Systems and Computing, vol 1129. Springer, Cham. https://doi.org/10.1007/978-3-030-39445-5_58
  • Chowdhury, Md Naseef-Ur-Rahman; Haque, Ahshanul; Soliman, Hamdy; Hossen, Mohammad Sahinur; Ahmed, Imtiaz; Fatima, Tanjim (2023): Android malware Detection using Machine learning: A Review. TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.22580881.v1
  • Haque, Ahshanul; Chowdhury, Md Naseef-Ur-Rahman; Soliman, Hamdy; Hossen, Mohammad Sahinur; Ahmed, Imtiaz; Fatima, Tanjim (2023): Wireless Sensor Networks anomaly detection using Machine Learning: A Survey. arxiv. Preprint. https://doi.org/10.48550/arXiv.2303.08823
  • Chowdhury, M.NUR., Alahy, Q.E., Soliman, H. (2021). Advanced Android Malware Detection Utilizing API Calls and Permissions. In: Kim, H., Kim, K.J. (eds) IT Convergence and Security. Lecture Notes in Electrical Engineering, vol 782. Springer, Singapore. https://doi.org/10.1007/978-981-16-4118-3_12
  • Francimaria R.S. Nascimento, George D.C. Cavalcanti, Márjory Da Costa-Abreu, Unintended bias evaluation: An analysis of hate speech detection and gender bias mitigation on social media using ensemble learning, Expert Systems with Applications, Volume 201, 2022, 117032, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2022.117032
  • Al-Makhadmeh, Z., Tolba, A. Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach. Computing 102, 501–522 (2020). https://doi.org/10.1007/s00607-019-00745-0
  • https://www.kaggle.com/datasets/mrmorj/hate-speech-and-offensive-language-dataset
  • L. Shaowen and C. Yong, "A Kind of Improved AdaBoost Algorithm," 2014 7th International Conference on Intelligent Computation Technology and Automation, Changsha, China, 2014, pp. 16-18, doi: 10.1109/ICICTA.2014.11.
  • Y. Zhang et al., "Research and Application of AdaBoost Algorithm Based on SVM," 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 2019, pp. 662-666, doi: 10.1109/ITAIC.2019.8785556.
  • W. G. Schneeweiss, "Fault-Tree Analysis Using a Binary Decision Tree," in IEEE Transactions on Reliability, vol. R-34, no. 5, pp. 453-457, Dec. 1985, doi: 10.1109/TR.1985.5222231.
  • Hossen, Mohammad Sahinur; Islam, Rakibul; Chowdhury, M.NUR.; Haque, Ahshanul; Alahy, Q.E. (2023). Malware Detection In Web Browser Plugins Using API Calls With Permissions, International Journal of Advanced Networking and Applications - IJANA, DOI: 10.35444/IJANA.2023.14603
  • Islam, Jahirul; Hasan, Mahadi; Hasan, Md Maruf (2023): Securing the Edge: A Comprehensive Review to Protecting Wireless Sensor Networks and Android Devices from Cyber Attacks. TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.22776035.v1
  • Rani, Sangeeta & Dhindsa, Kanwalvir. (2018). Android Malware Detection in Official and Third Party Application Stores. Int. J. Advanced Networking and Applications. 9. 3506-3509
  • Haque, A., Soliman, H., Chowdhury, M.NUR., Wireless Sensor Networks Data Anomaly Detection: A Smart Approach, IEEE 3rd CONIT 2023, Karnataka, India

Abstract Views: 95

PDF Views: 0




  • Hate Speech Detection in Social Media Using the Ensemble Learning Technique

Abstract Views: 95  |  PDF Views: 0

Authors

Ahshanul Haque
Department of Computer Science, New Mexico Tech, Socorro, New Mexico, United States
Md Naseef-Ur-Rahman Chowdhury
Department of Computer Science, New Mexico Tech, Socorro, New Mexico, United States
Hamdy Soliman
Department of Computer Science, New Mexico Tech, Socorro, New Mexico, United States

Abstract


Our lives have become intertwined with social media platforms such as Twitter, Facebook, LinkedIn, etc. They provide us with a platform to express our opinions and share our thoughts with the world. However, some individuals abuse the freedom of expression afforded to them by these platforms and utilize them to disseminate content that is derogatory and promotes hate speech. This has become a significant problem today, and detecting such content is a challenging task. In this research paper, we propose a solution for hate speech detection in social media using natural language processing techniques. We use a publicly available dataset provided by CrowdFlower and perform text pre-processing to clean the dataset. We then conduct feature engineering to extract key features that can be used in machine learning classification algorithms. We compare the performance of various algorithms about each feature set and conduct an in-depth analysis of the results obtained.

Keywords


Hate Speech Detection, Social Media, Natural Language Processing, Machine Learning, Artificial Intelligence, Sentiment Analysis.

References