Open Access Open Access  Restricted Access Subscription Access

Sentiment Analysis of Code Mixed Text Consisting of English- Punjabi Lexicon


Affiliations
1 Department o f Computer Science, Punjabi University, Patiala, India
2 School o f Management Studies, Punjabi University, Patiala, India
 

Sentiment analysis is a field of study for analyzing emotions of people such as happy, sad, angry, etc. towards the entities and attributes expressed in written text. In this study, the data was collected in the textual form from different sources like Facebook, YouTube, Twitter, and Whatsapp, then pre-processed the collected data. After that, identification of the language of code-mixed text performed, which includes tokenization, word-play, misspelled words, abbreviations, slang words, phonetic-typing, etc. After the identification task, the English-Punjabi dictionary was created which was consisting of opinionated words list like positive, negative, and neutral words list. The rest of the words are being stored in an unsorted word list. In the last, a statistical technique applied at sentence level sentiment polarity of the English-Punjabi code mixed dataset. It was identified that the results up to the Five-Grams and Tri-Grams approaches had the similarity.

Keywords

Code Mixed Text, Romanized Text, Natural Language Processing, Text Processing, Romanized Text, Sentiment Analysis, Microblogging.
User
Notifications
Font Size

  • Zuo, M., Diao, L., Liu, Q., & Wang, P. (2010, May). Data mining strategies and techniques of internet education public sentiment monitoring and analysis system. In 2010 2nd International Conference on Future Computer and Communication (Vol. 2, pp. V2-124). IEEE.
  • Tromp, E., & Pechenizkiy, M. (2011, May). Graph-based n-gram language identification on short texts. In Proc. 20th Machine Learning conference o f Belgium and The Netherlands (pp. 27-34).
  • Committed to connecting the world Feb- https://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx.
  • India - T-series Youtube Subscribers 2019. Sanika Diwanji - https://www.statista.com/statistics/ 1003413/tseries-subscriber-numbers youtube-india/
  • Pang, B., Lee, L.: “Opinion Mining and Sentiment Analysis”, in “Foundations and Trends in Information 956 Retrieval”, Volume 2, Issue 1-2, January 2008, pp. 1-135.
  • Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications o f the ACM, 56(4), p.82.
  • P. Turney, "Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews", Proceedings of the Association for Computational Linguistics (ACL), 2002, pp. 417-424.
  • B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up?: sentiment classification using machine learning techniques," Proceedings of the ACL-02 conference on Empirical methods in natural language processing,vol.10, 2002, pp. 79-86.
  • C. Myers-Scotton. Common and uncommon ground: Social and structural factors in codeswitching. Language in society, 22(4):475-503, 1993.
  • 1 J. J. Gumperz. Discourse strategies, volume 1. Cambridge University Press, 1982.
  • Jamatia, A., Das, A., & Gamback, B. (2019). Deep Learning-Based Language Identification in English-Hindi-Bengali Code-Mixed Social Media Corpora. Journal o f Intelligent Systems, 25(3), 399-408.
  • Ghosh, S., Ghosh, S., & Das, D. (2017). Sentiment Identification in Code-Mixed Social Media Text. arXiv preprint arXiv:1707.01184.
  • Ghosh, S., Ghosh, S., & Das, D. (2017). Complexity Metric for Code-Mixed Social Media Text. arXiv preprint arXiv:1707.01183.
  • Das, A., & Gamback, B. (2014). Identifying languages at the word level in code-mixed indian social media text.
  • Bansal, N., Goyal, V., & Rani, S. (2020). Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text. International Journal o f EAdoption (IJEA), 12(1), 52-62.
  • Aldogan, D., & Yaslan, Y (2017). A comparison study on active learning integrated ensemble approaches in sentiment analysis. Computers & Electrical Engineering, 57, 311-323.
  • Karyotis, C., Doctor, F., Iqbal, R., James, A., & Chang, V. (2018). A fuzzy computational model of emotion for cloud based sentiment analysis. Information Sciences, 433, 448-463.
  • Keshavarz, H., & Abadeh, M. S. (2017). ALGA: Adaptive lexicon learning using genetic algorithm for sentiment analysis of microblogs. Knowledge-Based Systems, 122, 1-16.
  • Dhar, M. (2018). Towards a Deeper Understanding o f Code-Mixing (Doctoral dissertation, International Institute of Information Technology Hyderabad).
  • Tyagi, P., & Tripathi, R. C. (2019, February). A review towards the sentiment analysis techniques for the analysis of twitter data. In Proceedings o f 2nd International Conference on Advanced Computing and Software Engineering (ICACSE).

Abstract Views: 249

PDF Views: 0




  • Sentiment Analysis of Code Mixed Text Consisting of English- Punjabi Lexicon

Abstract Views: 249  |  PDF Views: 0

Authors

Mukhtiar Singh
Department o f Computer Science, Punjabi University, Patiala, India
Vishal Goyal
Department o f Computer Science, Punjabi University, Patiala, India
Sahil Raj
School o f Management Studies, Punjabi University, Patiala, India

Abstract


Sentiment analysis is a field of study for analyzing emotions of people such as happy, sad, angry, etc. towards the entities and attributes expressed in written text. In this study, the data was collected in the textual form from different sources like Facebook, YouTube, Twitter, and Whatsapp, then pre-processed the collected data. After that, identification of the language of code-mixed text performed, which includes tokenization, word-play, misspelled words, abbreviations, slang words, phonetic-typing, etc. After the identification task, the English-Punjabi dictionary was created which was consisting of opinionated words list like positive, negative, and neutral words list. The rest of the words are being stored in an unsorted word list. In the last, a statistical technique applied at sentence level sentiment polarity of the English-Punjabi code mixed dataset. It was identified that the results up to the Five-Grams and Tri-Grams approaches had the similarity.

Keywords


Code Mixed Text, Romanized Text, Natural Language Processing, Text Processing, Romanized Text, Sentiment Analysis, Microblogging.

References