Open Access Open Access  Restricted Access Subscription Access

Text-To-Speech Synthesis Using Phoneme Concatenation


Affiliations
1 National University of Computer and Emerging Sciences (NUCES-FAST), A.K Brohi Road H 11/4, Islamabad, Pakistan
 

We proposed Text-To-Speech (TTS) synthesis system based on phonetic concatenation for unrestricted input text. The input text is first converted into phonetic transcription using Letter-to-Sound rules. For synthesis of a new speech, TTS system selects the recorded phoneme units (PUs) from database and modifies the duration according to the rule based on spelling using Time Domain Pitch Synchronous Overlap-Add (TD-PSOLA). The modified PUs are then concatenated by synchronizing pitch-periods at juncture and smoothen the transitions in order to remove the audible discontinuity and spectral mismatches. The pitch of PUs is kept to original neutral sounding.
This paper describes a simple, flexible and efficient procedure to smooth the boundaries of PUs and involves much lesser use of memory spaces. The listening test resulted on the perception of discontinuities proved that the proposed method performs better than standard TD-PSOLA system and produce highly intelligible synthetic speech.

Keywords

Text-To-Speech, Phonemes, Time Domain Pitch Synchronous Overlap-Add (TD-PSOLA), Concatenative Synthesis.
User
Notifications
Font Size

Abstract Views: 108

PDF Views: 0




  • Text-To-Speech Synthesis Using Phoneme Concatenation

Abstract Views: 108  |  PDF Views: 0

Authors

Mahwash Ahmed
National University of Computer and Emerging Sciences (NUCES-FAST), A.K Brohi Road H 11/4, Islamabad, Pakistan
Shibli Nisar
National University of Computer and Emerging Sciences (NUCES-FAST), A.K Brohi Road H 11/4, Islamabad, Pakistan

Abstract


We proposed Text-To-Speech (TTS) synthesis system based on phonetic concatenation for unrestricted input text. The input text is first converted into phonetic transcription using Letter-to-Sound rules. For synthesis of a new speech, TTS system selects the recorded phoneme units (PUs) from database and modifies the duration according to the rule based on spelling using Time Domain Pitch Synchronous Overlap-Add (TD-PSOLA). The modified PUs are then concatenated by synchronizing pitch-periods at juncture and smoothen the transitions in order to remove the audible discontinuity and spectral mismatches. The pitch of PUs is kept to original neutral sounding.
This paper describes a simple, flexible and efficient procedure to smooth the boundaries of PUs and involves much lesser use of memory spaces. The listening test resulted on the perception of discontinuities proved that the proposed method performs better than standard TD-PSOLA system and produce highly intelligible synthetic speech.

Keywords


Text-To-Speech, Phonemes, Time Domain Pitch Synchronous Overlap-Add (TD-PSOLA), Concatenative Synthesis.