Text-To-Speech Synthesis Using Phoneme Concatenation

Mahwash Ahmed; Shibli Nisar

Text-To-Speech Synthesis Using Phoneme Concatenation

Affiliations
1 National University of Computer and Emerging Sciences (NUCES-FAST), A.K Brohi Road H 11/4, Islamabad, Pakistan

Abstract
References
Article Metrics
Refbacks

We proposed Text-To-Speech (TTS) synthesis system based on phonetic concatenation for unrestricted input text. The input text is first converted into phonetic transcription using Letter-to-Sound rules. For synthesis of a new speech, TTS system selects the recorded phoneme units (PUs) from database and modifies the duration according to the rule based on spelling using Time Domain Pitch Synchronous Overlap-Add (TD-PSOLA). The modified PUs are then concatenated by synchronizing pitch-periods at juncture and smoothen the transitions in order to remove the audible discontinuity and spectral mismatches. The pitch of PUs is kept to original neutral sounding.
This paper describes a simple, flexible and efficient procedure to smooth the boundaries of PUs and involves much lesser use of memory spaces. The listening test resulted on the perception of discontinuities proved that the proposed method performs better than standard TD-PSOLA system and produce highly intelligible synthetic speech.