Open Access Open Access  Restricted Access Subscription Access

Implementation of Recurrent Neural Network with Language Model for Automatic Articulation Identification System in Bangla


Affiliations
1 Department of Computer Science & Engineering, East West University, Dhaka, Bangladesh
2 Department of Computer Science & Engineering, East West University, Dhaka, Bangladesh, Bangladesh
 

To nudge the state of the art of human-machine interacting applications, research in speech recognition systems has progressively been examining speech-to-text synthesis, but implementation has been done to minimal languages. Although the Bengali language has not been much of an object of interest, we present the automatic speech recognition (ASR) system solely based on this particular language since around 16% of the world’s population speak Bengali. It has been a demanding task to implement Bengali ASR because it consists of diacritic characters. We conduct a series of preprocessing and feature selection methods along with a convolutional neural net model in consideration of an automatic verbal communication recognition system. Furthermore, the researchers compared this method to a recurrent neural network that is based on an LSTM network and a vast data file of Google Inc. Investigation of these two models indicates such as the recurrent neural net outperforms the convolutional neural net: the former benefits from combining connectionist temporal classification (CTC) and language model (LM). A quantitative analysis of the output shows that the word error rate and validation loss can be affected by variation in dropout values. It also shows that the parameters are also affected by clean and augmented data.

Keywords

Convolutional Neural Network, CTC, Word Error Rate, Edit Distance, Augmented Data, Test Loss, Validation Loss, Clean Data, Graphical User Interface
User
Notifications
Font Size

  • The Past, Present, and Future of Speech Recognition Technology,“https://medium.com/swlh/the-pastpresent-and-future-of-speech-recognition-technologycf13c179aaf”, accessed on 02 November 2018.
  • Voice Recognition Software An Introduction, “http://www.bbc.co.uk/accessibility/guides/factsheets/ factsheet_VR_intro.pdf”, accessed on 02 November 2018.
  • M. S. Islam, "Research on Bangla language processing in Bangladesh: progress and challenges", in proc. of 8th International Language & Development Conference, pp. 527-533, 23-25 June 2009, Dhaka, Bangladesh.
  • R. Gordon, "Ethnologue: Languages of the World," 15th Ed., SIL International, Texas, 2005.
  • T. Aditi, V. Karun, “Speech recognition of Punjabi numerals using convolutional neural networks”, Advances in Computer Communication and Computational Sciences. Advances in Intelligent Systems and Computing, vol 759, Springer, Singapore.
  • Mon A.N., Pa W.P., Thu Y. K. (2018) Exploring the Effect of Tones for Myanmar Language Speech Recognition Using Convolutional Neural Network (CNN). In: Hasida K., Pa W. (eds) Computational Linguistics. PACLING 2017. Communications in Computer and Information Science, vol 781. Springer, Signapore.
  • B. Das, S. Mandal and P. Mitra, "Bengali speech corpus for continuous automatic speech recognition system," 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA), Hsinchu, 2011, pp. 51-55.
  • Md. A. Hasnat, J. Mowla, M. Khan, “Isolated and continuous Bengali speech recognition: implementation, performance and application perspective,” 2007.
  • Md. A. Ali, M. Hossain, M. N. Bhuiyan, “Automatic speech recognition technique for bangla words,” International Journal of advanced science and technology vol. 50, January, 2013.
  • P. Banerjee, G. Garg, P. Mitra, A. Basu, "Application of triphone clustering in acoustic modeling for continuous speech recognition in Bengali," 2008 19th International Conference on Pattern Recognition, Tampa, FL, 2008, pp. 1-4.
  • G. Muhammad, Y. A. Alotaibi, M. N. Huda, "Automatic speech recognition for Bangla digits," 2009 12th International Conference on Computers and Information Technology, Dhaka, 2009, pp. 379-383.
  • Md. A. Hossain, Md. M. Rahman, U. K. Prodhan, Md. F. Khan, “Implementation of back-propagation neural network for isolated Bengali speech recognition,” International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.4,July 2013.
  • Sultana, Shaheena, Akhand, M. A. H., Das, Prodip, Rahman, M. M.. (2012). Bangla Speech-to-Text conversion using SAPI. 385-390. 10.1109/ICCCE.2012.6271216.
  • Bhowmik, Tanmay, Choudhury, Amitava, Mandal, Das. (2018). Deep Neural Network Based Recognition and Classification of Bengali Phonemes: A Case Study of Bengali Unconstrained Speech.
  • Manjunath, K.E., S. Rao, K. Circuits Syst Signal Process (2018) 37: 704.
  • Phadikar S., Das P., Bhakta I., Roy A., Midya S., Majumder K. (2017) Bengali Phonetics Identification Using Wavelet Based Signal Feature. In: Mandal J., Dutta P., Mukhopadhyay S. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2017. Communications in Computer and Information Science, vol 775. Springer, Singapore.
  • Tripathi, K., Rao, K.S. Int J Speech Technol (2018) 21: 489.
  • S. Ahmed Sumon, J. Chowdhury, S. Debnath, N. Mohammed and S. Momen, "Bangla Short Speech Commands Recognition Using Convolutional Neural Networks," 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), 2018, pp. 1-6, doi: 10.1109/ICBSLP.2018.8554395.
  • Sabab M.N., Chowdhury M.A.R., Nirjhor S.M.M.I., Uddin J. (2020) Bangla Speech Recognition Using 1D-CNN and LSTM with Different Dimension Reduction Techniques. In: Miraz M.H., Excell P.S., Ware A., Soomro S., Ali M. (eds) Emerging Technologies in Computing. iCETiC 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 332. Springer, Cham. https://doi.org/10.1007/978-3- 030-60036-5_11
  • Vowel Diacritics in Bengali,“ https://en.wikibooks.org/wiki/Bengali/Script/Diacritic s”, accessed on 02 November 2018.
  • A. Hannun, C. Case, J. Casper et al.”Deep Speech: Scaling up end-to-end speech recognition,” 19 December 2014.
  • Schluter R. et al. (2016) Automatic Speech Recognition Based on Neural Networks. In: Ronzhin A., Potapova R., Nemeth G. (eds) Speech and Computer Science, vol 9811. Springer, Cham.
  • O. F. Rakib, S. Akter, M. A. Khan, A. K. Das and K. M. Habibullah, "Bangla Word Prediction and Sentence Completion Using GRU: An Extended Version of RNN on Ngram Language Model," 2019 International Conference on Sustainable Technologies for Industry 4.0 (STI), 24-25 December, Dhaka.
  • E. A. Emon, S. Rahman, J. Banarjee, A. K. Das and T. Mittra, "A Deep Learning Approach to Detect Abusive Bengali Text," 2019 7th International Conference on Smart Computing & Communications (ICSCC), Sarawak, Malaysia, Malaysia, 2019.
  • M. M. Hossain, M. F. Labib, A. S. Rifat, A. K. Das and M. Mukta, "Auto-correction of English to Bengali Transliteration System using Levenshtein Distance," 2019 7th International Conference on Smart Computing & Communications (ICSCC), Sarawak, Malaysia, Malaysia, 2019.
  • M. D. Drovo, M. Chowdhury, S. I. Uday and A. K. Das, "Named Entity Recognition in Bengali Text Using Merged Hidden Markov Model and Rule Base Approach," 2019 7th International Conference on Smart Computing & Communications (ICSCC), Sarawak, Malaysia, Malaysia, 2019.
  • E. Biswas and A. K. Das, "Symptom-Based Disease Detection System In Bengali Using Convolution Neural Network," 2019 7th International Conference on Smart Computing & Communications (ICSCC), Sarawak, Malaysia, Malaysia, 2019.
  • A. K. Das, A. Ashrafi and M. Ahmmad, "Joint Cognition of Both Human and Machine for Predicting Criminal Punishment in Judicial System," 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), Singapore, 2019, pp. 36-40.
  • R. A. Tuhin, B. K. Paul, F. Nawrine, M. Akter and A. K. Das, "An Automated System of Sentiment Analysis from Bangla Text using Supervised Learning Techniques," 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), Singapore, 2019, pp. 360-364.
  • J. Islam, M. Mubassira, M. R. Islam and A. K. Das, "A Speech Recognition System for Bengali Language using Recurrent Neural Network," 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), Singapore, 2019, pp. 73-76.
  • T. F. Mumu, I. J. Munni, and A. K. Das, “Depressed People Detection from Bangla Social Media Status using LSTM and CNN Approach”, J. eng. adv., vol. 2, no. 01, pp. 41-47, Mar. 2021.
  • M. T. Hossain, M. W. Hasan and A. K. Das, "Bangla Handwritten Word Recognition System Using Convolutional Neural Network," 2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM), 2021, pp. 1-8.

Abstract Views: 191

PDF Views: 1




  • Implementation of Recurrent Neural Network with Language Model for Automatic Articulation Identification System in Bangla

Abstract Views: 191  |  PDF Views: 1

Authors

Masiath Mubassira
Department of Computer Science & Engineering, East West University, Dhaka, Bangladesh
Amit Kumar Das
Department of Computer Science & Engineering, East West University, Dhaka, Bangladesh, Bangladesh

Abstract


To nudge the state of the art of human-machine interacting applications, research in speech recognition systems has progressively been examining speech-to-text synthesis, but implementation has been done to minimal languages. Although the Bengali language has not been much of an object of interest, we present the automatic speech recognition (ASR) system solely based on this particular language since around 16% of the world’s population speak Bengali. It has been a demanding task to implement Bengali ASR because it consists of diacritic characters. We conduct a series of preprocessing and feature selection methods along with a convolutional neural net model in consideration of an automatic verbal communication recognition system. Furthermore, the researchers compared this method to a recurrent neural network that is based on an LSTM network and a vast data file of Google Inc. Investigation of these two models indicates such as the recurrent neural net outperforms the convolutional neural net: the former benefits from combining connectionist temporal classification (CTC) and language model (LM). A quantitative analysis of the output shows that the word error rate and validation loss can be affected by variation in dropout values. It also shows that the parameters are also affected by clean and augmented data.

Keywords


Convolutional Neural Network, CTC, Word Error Rate, Edit Distance, Augmented Data, Test Loss, Validation Loss, Clean Data, Graphical User Interface

References