Refine your search
Collections
Journals
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Kwak, Hyok
- Neighborhood Loss for Age Estimation from Face Image Using Convolutional Neural Networks
Abstract Views :78 |
PDF Views:1
Authors
Affiliations
1 Institute of Information Technology, High-Tech Research and Development Centre, Kim Il Sung University, KP
1 Institute of Information Technology, High-Tech Research and Development Centre, Kim Il Sung University, KP
Source
ICTACT Journal on Image and Video Processing, Vol 13, No 1 (2022), Pagination: 2770-2774Abstract
Convolutional Neural Network (CNN) is widely used in estimating age from face image. In many CNN applications such as image classification, face recognition and other computer vision scopes, the cross-entropy loss is used as a supervision signal to train CNN model. However, the cross-entropy loss only enhances the separability of classes and does not consider their correlation in age estimation task. In this paper we propose a novel loss function called neighborhood loss which regards the correlation between classes in age estimation by modifying standard cross entropy loss. To evaluate the effectiveness of the proposed neighborhood loss, we present CNN architecture based on the residual units. Through some experiments, we show that neighborhood loss provides superior performance compared to prior works in age estimation.Keywords
Age Estimation, Neighborhood Loss, Convolutional Neural Network.References
- M. Riesenhuber and T. Poggio, “Hierarchical Models of Object Recognition in Cortex”, Nature neuroscience, Vol. 2, No. 11, pp. 1019-1025, 1999.
- K.H. Liu, T.J. Liu, H.H. Liu and S.C. Pei, “Facial Makeup Detection via Selected Gradient Orientation of Entropy Information”, Proceedings of IEEE International Conference on Image Processing, pp. 4067-4071, 2015.
- T. Ahonen, A. Hadid and M. Pietikainen, “Face Description with Local Binary Patterns: Application to Face Recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 12, pp. 2037-2041, 2006.
- K.M. He, X.Y. Zhang, S.Q. Ren and J. Sun, “Identity Mappings in Deep Residual Networks”, Proceedings of European Conference on Computer Vision, pp. 630-645, 2016.
- F. Schroff, D. Kalenichenko and J. Philbin, “FaceNet: A Unified Embedding for Face Recognition and Clustering”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 815-823, 2015.
- W. Liu, “ImageNet Large Scale Visual Recognition Challenge (ILSVRC) Overview”, Available at https://www.image-net.org/challenges/LSVRC/2016/index.php, Accessed at 2016.
- W. Liu, “ImageNet Large Scale Visual Recognition Challenge (ILSVRC) Overview”, Available at https://image-net.org/challenges/LSVRC/2017/ , Accessed at 2017.
- Microsoft Face API, Available at: http://microsoft.com/cognitive-services/en-us/faceapi, Accessed at 2021.
- Face++, Available at: http://www.faceplusplus.com/demo-detect/, Accessed at 2021.
- Software Development Kit, Available at: http://uxand.com, Accessed at 2021.
- G. Levi and T. Hassner, “Age and Gender Classification using Convolutional Neural Networks”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-13, 2015.
- H.F Yang, B.Y. Lin, K.Y. Chang and C.S. Chen, “Automatic Age Estimation from Face Images via Deep Ranking”, Proceedings of British Machine Vision Conference, pp. 1-8, 2015.
- R. Rothe, R. Timofte and L.V. Gool, “DEX: Deep Expectation of Apparent Age from a Single Image”, Proceedings of International Conference on Computer Vision Workshop, pp. 252-257, 2015.
- J.K. Deng, J. Guo, Y.X. Zhou, J.K. Yu, I. Kotsia and S. Zafeiriou, “Retinaface: Single-Stage Dense Face Localisation in the Wild”, Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 1-7, 2019.
- IMDB Face Dataset, Available at: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/static/imdb_crop.tar, Accessed at 2022.
- Wiki face dataset, https://data.vision.ee.ethz.ch/cvl/rrothe/imdbwiki/static/wiki_crop.tar, Accessed at 2021.
- G. Panis, A. Lanitis, N. Tsapatsoulis and T.F. Cootes, “Overview of Research on Facial Ageing using the FG-NET Ageing Database”, IET Biometrics, Vol. 5, No.2, pp. 37-46, 2016.
- K. Ricanek and T. Tesafaye, “Morph: A Longitudinal Image Database of Normal Adult Age-Progression”, Proceedings of International Conference on Automatic Face and Gesture Recognition, pp. 1-8, 2006.
- B.C. Chen, C.S. Chen and W.H. Hsu, “Face Recognition and Retrieval using Cross-Age Reference Coding with Crossage Celebrity Dataset”, IEEE Transactions on Multimedia, Vol. 17, No. 6, pp. 804-815, 2015.
- K.M. He, X.Y. Zhang, S.Q. Ren and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”, Proceedings of IEEE International Conference on Computer Vision, pp. 1026-1034, 2015.
- H. Han, C. Otto, X.M. Liu and A.K. Jain, “Demographic Estimation from Face Images: Human vs Machine Performance”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, No. 6, pp. 1148-1161, 2015.
- K. Chen, S.G. Gong, T. Xiang and C.C. Loy, “Cumulative Attribute Space for Age and Crowd Density Estimation”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2467-2474, 2013.
- G.D. Guo, Y. Fu, C.R. Dyer and T.S. Huang, “Image-Based Human Age Estimation by Manifold Learning and Locally Adjusted Robust Regression”, IEEE Transactions on Image Processing, Vol. 17, No. 7, pp. 1178-1188, 2008.
- K.Y. Chang, C.S. Chen and Y.P. Hung, “Ordinal Hyperplanes Ranker with Cost Sensitivities for Age Estimation”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 585-592, 2011.
- X.L. Wang, R. Guo and C. Kambhamettu, “Deeply-Learned Feature for Age Estimation”, Proceedings of IEEE Winter Conference on Applications of Computer Vision, pp. 534-541, 2015.
- Multichannel Speech Enhancement of Target Speaker Based on Wakeup Word Mask Estimation with Deep Neural Network
Abstract Views :92 |
PDF Views:0
Authors
Affiliations
1 Institute of Information Technology, Hightech Research & Development Center Kim Il Sung University, Pyongyang, KP
1 Institute of Information Technology, Hightech Research & Development Center Kim Il Sung University, Pyongyang, KP
Source
International Journal of Advanced Networking and Applications, Vol 15, No 1 (2023), Pagination: 5754-5759Abstract
In this paper, we address a multichannel speech enhancement method based on wakeup word mask estimation using Deep Neural Network (DNN). It is thought that the wakeup word is an important clue for target speaker. We use a DNN to estimate the wakeup word mask and noise mask and apply them to separate the mixed wakeup word signal into target speaker’s speech and background noise. Convolutional Recurrent Neural Network (CRNN) is used to exploit both short and long term time-frequency dependencies of sequences such as speech signals. Generalized Eigen Vector (GEV) beamforming estimates the spatial filter by using the masks to enhance the following speech command of target speaker and reduce undesirable noise. Experiment results show that the proposal provides more robust to noise, so that improves the Signal-to-Noise Ratio (SNR) and speech recognition accuracy.Keywords
Multichannel Speech Enhancement, Wakeup Word, Mask Estimation, Beamforming, Deep Neural Network (DNN).References
- B.Y. Xia, and C.C. Bao, Speech enhancement with weighted denoising auto-encoder, Proc. 14th Annual Conf. of the International Speech Communication Association, Lyon, France, 2013, 3411–3415.
- J. Heymann, L. Drude, A. Chinaev, and R. Haeb-Umbach, BLSTM supported GEV beamformer front-end for the 3rd CHIME challenge, Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Scottsdale, AR, 2015, 444-451.
- B.D. Van Veen, and K.M. Buckly, Beamforming: a versatile approach to spatial filtering, IEEE Acoustic, Speech and Signal Processing Magazine, 5(2), 1988, 4-24.
- S. Doclo, W. Kellermann, S. Makino, and S. Nordholm, Multichannel signal enhancement algorithms for assisted listening devices, IEEE Signal Processing Magazine, 32(2), 2015, 18-30.
- T. Hori, Z. Chen, H. Erdogan, J.R. Hershey, J. Le Roux, V. Mitra, and S. Watanabe, Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend, Computer Speech and Language, 46, 2017, 401-418.
- Y. Kida, D. Tran, M. Omachi, T. Taniguchi, and Y. Fujita, Speaker selective beamformer with keyword mask estimation, Proc. 2018 IEEE Workshop on Spoken Language Technology, Athens, Greece, 2018, 528-534.
- E. Warsitz, and R. Haeb-Umbach, Blind acoustic beamforming based on generalized eigenvalue decomposition, IEEE Transactions on Audio Speech & Language Processing, 15(5), 2007, 1529-1539.
- J. Heymann, L. Drude, and R. Haeb-Umbach, Neural network based spectral mask estimation for acoustic beamforming, Proc. 41st IEEE International Conf. on Acoustics, Speech and Signal Processing, Shanghai, PRC, 2016, 196–200.
- J. Heymann, L. Drude, and R. Haeb-Umbach, A generic neural acoustic beamforming architecture for robust multi-channel speech processing, Computer Speech & Language, 46, 2017, 374-385.
- L. Yin, H. Ying, L.D. Kun, L. Rui, and Y.M. Hao, Chinese sign language recognition based on two-stream CNN and LSTM network, International Journal of Advanced Networking and Applications, 14(6), 2023, 5666-5671.
- P. Elechi, E. Okowa, and O.P. Illuma, Analysis of a SONAR detecting system using multi-beamforming algorithm, International Journal of Advanced Networking and Applications, 14(5), 2023, 5596-5601.
- D. Amodei, S. Ananthanarayan, R. Anubhai, J.L. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, and Q. Cheng, Deep speech 2: End-to-end speech recognition in English and Mandarin, Proc. 33rd International Conf. on Machine Learning, New York, NY, 2016.
- Y.B. Zhou, C.M. Xiong, and R. Socher, Regularization techniques for end-to-end speech recognition, Patent, San Francisco, CA, US, US20190130896A1, 2019.
- F.Y. Hou, L. Xie, and Z.H. Fu, Investigating neural network based query-by-example keyword spotting approach for personalized wake-up word detection in Mandarin Chinese, Proc. 10th International Symposium on Chinese Spoken Language Processing, Tianjin, PRC, 2017.
- G.G.Chen, C. Parada, and G. Heigold, Small-footprint keyword spotting using deep neural networks, Proc. 2014 IEEE International Conf. on Acoustics, Speech and Signal Processing, Florence, Italy, 2014.
- Y.D. Zhang, N. Suda, L.Z. Lai, and V. Chandra, Hello Edge: Keyword spotting on microcontrollers, arXiv: 1711.07128, 2017.
- T.N. Sainath, and C. Parada, Convolutional neural networks for small-footprint keyword spotting, Proc. 16th Annual Conf. of the International Speech Communication Association, Dresden, Germany, 2015.
- A. Krueger, E. Warsitz, and R. Haeb-Umbach, Speech enhancement with a GSC-like structure employing eigenvector-based transfer function ratios estimation, IEEE Transactions on Audio, Speech and Language Processing, 19(1), 2011, 206–219.
- H. Lucy, The MagPi (Raspberry Pi Trading Ltd, 30 Station Road, Cambridge, 2018).