Open Access
Subscription Access
Text-To-Speech Synthesis Using Phoneme Concatenation
We proposed Text-To-Speech (TTS) synthesis system based on phonetic concatenation for unrestricted input text. The input text is first converted into phonetic transcription using Letter-to-Sound rules. For synthesis of a new speech, TTS system selects the recorded phoneme units (PUs) from database and modifies the duration according to the rule based on spelling using Time Domain Pitch Synchronous Overlap-Add (TD-PSOLA). The modified PUs are then concatenated by synchronizing pitch-periods at juncture and smoothen the transitions in order to remove the audible discontinuity and spectral mismatches. The pitch of PUs is kept to original neutral sounding.
This paper describes a simple, flexible and efficient procedure to smooth the boundaries of PUs and involves much lesser use of memory spaces. The listening test resulted on the perception of discontinuities proved that the proposed method performs better than standard TD-PSOLA system and produce highly intelligible synthetic speech.
This paper describes a simple, flexible and efficient procedure to smooth the boundaries of PUs and involves much lesser use of memory spaces. The listening test resulted on the perception of discontinuities proved that the proposed method performs better than standard TD-PSOLA system and produce highly intelligible synthetic speech.
Keywords
Text-To-Speech, Phonemes, Time Domain Pitch Synchronous Overlap-Add (TD-PSOLA), Concatenative Synthesis.
User
Font Size
Information
Abstract Views: 108
PDF Views: 0