Deep Bidirectional RNNs Using Gated Recurrent Units & Long Short-Term Memory Units for Building Acoustic Models for Automatic Speech Recognition

Madhuri Jain; Nishita Dutta; Dnyaneshwari Bhirud; Nikahat Mulla

Deep Bidirectional RNNs Using Gated Recurrent Units & Long Short-Term Memory Units for Building Acoustic Models for Automatic Speech Recognition

Madhuri Jain , Nishita Dutta , Dnyaneshwari Bhirud , Nikahat Mulla

Affiliations
1 Sardar Patel Institute of Technology, Andheri, Mumbai, Maharashtra, India

Deep Neural Networks are gaining popularity to train speech dataset for speech recognition. A lot of work has been done with various neural network models, starting right from conventional convolutional neural networks to deep recurrent neural networks. Research has led us to arrive at the conclusion that bidirectional RNNs are suited for speech recognition. It has been seen that bidirectional RNNs provide greater accuracy as compared to deep RNNs and unidirectional RNNs. Units that are used with bidirectional RNNs are usually Long Short-Term Memory units. They have their own advantages and disadvantages. Gated Recurrent Units can also be used. In this paper we have tried to experiment and compare between deep bidirectional models using GRU units and LSTM units.

Keywords

Acoustic Modeling, Automatic Speech Recognition, Bidirectional RNN, Convolutional Neural Networks, Deep Recurrent Neural Networks, Gated Recurrent Unit, Keras, Long Short-Term Memory (LSTM), MFCC, Recurrent Neural Networks, TimeDistributed Dense, TensorFlow, Spectrogram.

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.

M. Schuster, and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673-2681, 1997.

T. Mikolov, A. Deoras, D. Povey, L. Burget, and J. Cernocky, “Strategies for training large scale neural network language models,” in 2011 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), 2011.

A.-R. Mohamed, F. Seide, D. Yu, J. Droppo, A. Stoicke, G. Zweig, G. Penn, “Deep bi-directional recurrent networks over spectral windows,” in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2015.

X. Li, and X. Wu, “Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.

X. Li, and X. Wu, “Improving long short-term memory networks using maxout units for large vocabulary speech recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.

H.-K. J. Kuo, E. Arisoy, A. Emami, and P. Vozila, “Large scale hierarchical neural network language models,” in Proceedings of Interspeech, Portland, Oregon, USA, 2012.

E. Arisoy, A. Sethy, B. Ramabhadran, and S. Chen, “Bidirectional recurrent neural network language models for automatic speech recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.

A. Graves, N. Jaitly, and A.-R. Mohamed, “Hybrid speech recognition with deep bidirectional LSTM,” in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273-278, 2013.

A. Zeyer, P. Doetsch, P. Voigtlaender, R. Schluter, and H. Ney, “A comprehensive study of deep bidirectional LSTM RNNs for acoustic modeling in speech recognition,” in ICASSP 2017 Conference, IEEE, 2017.

A. Graves, A.-R. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013.

J. Chung, C. Gulcehre, K. H. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,”. Available at arXiv:1412.3555 [cs.NE]

Z. Wu, and S. King, “Investigating gated recurrent networks for speech synthesis,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.

V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An ASR corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.

H. Sak, A. Senior, and F. Beaufays, “Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition,”. Available at arXiv:1402.1128 [cs.NE]

M. Sundermeyer, R. Schluter, and H. Neym, “LSTM neural networks for language modeling,” in Proceedings of Interspeech, 2012.

D. P. Kingma, and J. Ba, “Adam: A method for stochastic optimization,” in 3^rd International Conference for Learning Representations (ICLR), 2015. Available at arXiv:1412.6980 [cs.LG]

A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23^rd International Conference on Machine Learning, Pittsburgh, PA, 2006.

Abstract Views: 545

PDF Views: 1

Deep Bidirectional RNNs Using Gated Recurrent Units & Long Short-Term Memory Units for Building Acoustic Models for Automatic Speech Recognition

Abstract Views: 545 | PDF Views: 1

Authors

Madhuri Jain
Sardar Patel Institute of Technology, Andheri, Mumbai, Maharashtra, India

Nishita Dutta
Sardar Patel Institute of Technology, Andheri, Mumbai, Maharashtra, India

Dnyaneshwari Bhirud
Sardar Patel Institute of Technology, Andheri, Mumbai, Maharashtra, India

Nikahat Mulla
Sardar Patel Institute of Technology, Andheri, Mumbai, Maharashtra, India

Username
Password
Remember me

Username
Password
Remember me

International Journal of Research in Signal Processing, Computing & Communication System Design

International Journal of Research in Signal Processing, Computing & Communication System Design

Deep Bidirectional RNNs Using Gated Recurrent Units & Long Short-Term Memory Units for Building Acoustic Models for Automatic Speech Recognition

Subscribe/Renew Journal

Keywords

Deep Bidirectional RNNs Using Gated Recurrent Units & Long Short-Term Memory Units for Building Acoustic Models for Automatic Speech Recognition

Authors

Abstract

Keywords

References