Open Access Open Access  Restricted Access Subscription Access

Text Processing for Developing Unrestricted Tamil Text to Speech Synthesis System


Affiliations
1 School of Computing Science and Engineering, VIT University, Chennai Campus, Tamil Nadu, India
 

In this Information and communication technology era, designing interactive computer systems that are effective, efficient, easy, and enjoyable to use is becoming increasingly important. Of the numerous ways explored by researchers to enhance Human-Computer Interaction, Text to Speech or Speech Synthesis affirms to be one such modality for developing better interfaces. The focal point here is to enhance the text processing module of Tamil speech synthesizer with an efficient and robust text normalizer and loan word identifier. Text normalization is performed on unrestricted Tamil text to convert non-standard words into standard words for the reduction of ambiguous utterances along the interim processing of the words. Loan words in Tamil text are identified in order to improve the pronunciation model of the Tamil speech synthesizer system. In this paper, we describe a ‘semiotic classifier’ based on decision list approach with which we are able to tackle many varieties of non-standard words. We also describe a ‘loan/native word classifier’ based on multiple linear regression which works efficiently even on shorter words of 3 syllables in length. In today’s predominant Digital, Information-Communication Technology and Human-Computer Interaction era such profound text processors is imperative.

Keywords

Natural Language Processing, Tamil, Text Processing, Text-to-Speech (TTS), Unrestricted Text
User

Abstract Views: 169

PDF Views: 0




  • Text Processing for Developing Unrestricted Tamil Text to Speech Synthesis System

Abstract Views: 169  |  PDF Views: 0

Authors

Vaibhavi Rajendran
School of Computing Science and Engineering, VIT University, Chennai Campus, Tamil Nadu, India
G. Bharadwaja Kumar
School of Computing Science and Engineering, VIT University, Chennai Campus, Tamil Nadu, India

Abstract


In this Information and communication technology era, designing interactive computer systems that are effective, efficient, easy, and enjoyable to use is becoming increasingly important. Of the numerous ways explored by researchers to enhance Human-Computer Interaction, Text to Speech or Speech Synthesis affirms to be one such modality for developing better interfaces. The focal point here is to enhance the text processing module of Tamil speech synthesizer with an efficient and robust text normalizer and loan word identifier. Text normalization is performed on unrestricted Tamil text to convert non-standard words into standard words for the reduction of ambiguous utterances along the interim processing of the words. Loan words in Tamil text are identified in order to improve the pronunciation model of the Tamil speech synthesizer system. In this paper, we describe a ‘semiotic classifier’ based on decision list approach with which we are able to tackle many varieties of non-standard words. We also describe a ‘loan/native word classifier’ based on multiple linear regression which works efficiently even on shorter words of 3 syllables in length. In today’s predominant Digital, Information-Communication Technology and Human-Computer Interaction era such profound text processors is imperative.

Keywords


Natural Language Processing, Tamil, Text Processing, Text-to-Speech (TTS), Unrestricted Text



DOI: https://doi.org/10.17485/ijst%2F2015%2Fv8i29%2F122197