Refine your search
Collections
Co-Authors
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Rani, Simpel
- Segmentation of Broken Characters of Handwritten Gurmukhi Script
Abstract Views :139 |
PDF Views:0
Authors
Bharti Mehta
1,
Simpel Rani
2
Affiliations
1 Department of Computer Engineering, Yadavindra College of Engineering, Talwandi Sabo (Bathinda), IN
2 Department of Computer Engineering, Yadavindra College of Engineering, Talwandi Sabo (Bathinda), IN
1 Department of Computer Engineering, Yadavindra College of Engineering, Talwandi Sabo (Bathinda), IN
2 Department of Computer Engineering, Yadavindra College of Engineering, Talwandi Sabo (Bathinda), IN
Source
Research Cell: An International Journal of Engineering Sciences, Vol 13 (2014), Pagination: 95-105Abstract
Character Segmentation of Handwritten Documents has been an active area of research and due to its diverse applicable environment; it continues to be a challenging research topic. The desire to edit scanned text document forces the researchers to think about the optical character recognition (OCR). OCR is the process of recognizing a segmented part of the scanned image as a character. OCR process consists of three major sub processes - pre processing, segmentation and then recognition. Out of these three, the segmentation process is the most important phase of the overall OCR process. Different problems in the characters segmentation of handwritten text is due to the different writing style of different people because the size and shape is not fixed while we write any text. In this work, we formulate an algorithm to segment the scanned document image as a character. According to proposed algorithm, broken characters in Gurmukhi script, we used the segmentation of these characters that can become easily identify how many characters are in one word. To develop the algorithm to segment the characters from a word we are using combinations of two approaches which are Horizontal Profile Projection and Vertical Profile Projection. And get the accuracy is 93%.Keywords
Gurmukhi Script, OCR, Segmentation, Handwritten Document, Horizontal Profile Projection, Vertical Profile Projection.- Word Level Language Identification of English-Punjabi Code-Mixed Social Media Text
Abstract Views :296 |
PDF Views:0
Authors
Affiliations
1 Department of Computer Science, Punjabi University College of Engineering & Management, Rampura Phul, IN
2 Department of Computer Science Punjabi University, Patiala, IN
3 Department of Computer Science and Engineering, Yadavindra College of Engineering, Talwandi Sabo, IN
1 Department of Computer Science, Punjabi University College of Engineering & Management, Rampura Phul, IN
2 Department of Computer Science Punjabi University, Patiala, IN
3 Department of Computer Science and Engineering, Yadavindra College of Engineering, Talwandi Sabo, IN
Source
Research Cell: An International Journal of Engineering Sciences, Vol 33 (2020), Pagination: 24-32Abstract
Code mixing denotes using multiple languages in an utterance. It is clearly seen that code mixing is pervasive while people communicate over social media irrelevant of the mode being used. The fusion of languages makes it more challenging and requires consistent updates according to recent trends. The current paper addresses three approaches namely CRFs (Conditional Random Fields), Bi-LSTM (Long Short-term Memory) and CNNs( Convolutional Neural Networks). Firstly, for word-level language identification of code-mixed English-Punjabi text CRF based system uses lexical, contextual, character ngram, and special character features. Secondly, Recursive Neural Network namely Bi-LSTM with glove embedding is used for language identification and thirdly CNN with glove embedding is used for language identification. It is observed that CRFs is the best performing system with an f1-score of 0.96.Keywords
Code Mixing, Language Identification, Deep Learning, Glove Embedding, Conditional Random Fields.References
- Neetika, Vishal Goyal, and Simpel Rani. "Automatic Understanding of Code Mixed Social Media Text: A State of the Art." Advances in Information Communication Technology and Computing: 91. https://doi.org/10.1007/978-981-15-5421-6_10
- Gold, E. Mark. "Language identification in the limit." Information and control 10, no. 5 (1967): 447-474.
- Gumperz, John J. Discourse strategies. Vol. 1. Cambridge University Press, 1982.
- Myers-Scotton, Carol. Duelling languages: Grammatical structure in codeswitching. Oxford University Press, 1997.
- Beesley, Kenneth R. "Language identifier: A computer program for automatic natural-language identification of on-line text." In Proceedings of the 29 th annual conference of the American Translators Association, vol. 47, p. 54. 1988.
- Cavnar, William B., and John M. Trenkle. "N-gram-based text categorization." In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, vol. 161175.1994.
- Dunning, Ted. Statistical identification of language. Las Cruces, NM, USA: Computing Research Laboratory, New Mexico State University,1994.
- Prager, John M. "Linguini: Language identification for multilingual documents." Journal of Management Information Systems 16, no. 3 (1999): 71-101.
- Lui, Marco, and Timothy Baldwin. "langid. py: An off-the-shelf language identification tool." In Proceedings of the ACL 2012 system demonstrations, pp. 25-30. 2012.
- King, Ben, and Steven Abney. "Labeling the languages of words in mixed-language documents using weakly supervised methods." In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1110-1119. 2013.
- Lignos, Constantine, and Mitch Marcus. "Toward web-scale analysis of codeswitching." In 87 th Annual Meeting of the Linguistic Society of America, vol. 90. 2013.
- Nguyen, Dong, and A. Seza Dogruoz. 2013 Word level language identification in online multilingual communication. In Proceedings of the 2013 conference on empirical methods in natural language processing, 857-862.
- Nguyen, Dong, and A. Seza Dogruoz. "Word level language identification in online multilingual communication." In Proceedings of the 2013 conference on empirical methods in natural language processing, pp. 857-862. 2013.
- GokulChittaranjan, Yogarshi Vyas, Kalika Bali, and Monojit Choudhury. "A framework to label code-mixed sentences in social media." In Proceedings of the First Workshop on Computational Approaches to Code-Switching, Doha, Qatar, October. ACL. 2014.
- Chang, Joseph Chee, and Chu-Cheng Lin. "Recurrent-neural-network for language detection on Twitter code-switching corpus." arXiv preprint arXiv: 1412.4314 (2014).
- Sharma, Arnav, and Raveesh Motlani. "Pos tagging for code-mixed indian social media text: Systems from iiit-h for icon nlp tools contest." In International Conference On Natural Language Processing. 2015.
- Samih, Younes, Suraj Maharjan, Mohammed Attia, Laura Kallmeyer, and Thamar Solorio. "Multilingual code-switching identification via lstm recurrent neural networks." In Proceedings of the Second Workshop on Computational Approaches to Code Switching, pp. 50-59. 2016.
- Shekhar, Shashi, Dilip Kumar Sharma, and MM Sufyan Beg. "Embedding Framework for Identifying Ambiguous Words in Code-Mixed Social Media Text." In 2019 International Conference on contemporary Computing and Informatics (IC3I), pp. 59-63. IEEE, 2019.
- Jamatia, Anupam, Amitava Das, and Bjorn Gamback. "Deep Learning-Based Language Identification in English-Hindi-Bengali Code-Mixed Social Media Corpora." Journal of Intelligent Systems 28, no. 3 (2019): 399-408.
- Bhaskaran, Sreebha, Geetika Paul, Deepa Gupta, and J. Amudha. "Indian Language Identification for Short Text." In Advances in Computational Intelligence and Communication Technology, pp. 47-58. Springer, Singapore. 2020.
- Jamatia, Anupam, Steve Durairaj Swamy, Bjorn Gamback, Amitava Das, and Swapan Debbarma. "Deep Learning Based Sentiment Analysis in a Code-Mixed English-Hindi and English-Bengali Social Media Corpus." International Journal on Artificial Intelligence Tools (2020).
- Bansal, Neetika, Vishal Goyal, and Simpel Rani. "Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text." International Journal of E-Adoption (IJEA) 12, no. 1 (2020): 52-62.
- Gundapu, Sunil, and Radhika Mamidi. "Word Level Language Identification in English Telugu Code Mixed Data." In PACLIC. 2018.