Back to Search

Prosody Generation Model for TTS Systems

AUTHOR Teixeira, Joo Paulo; Teixeira, Joao Paulo; Teixeira, Jo O. Paulo
PUBLISHER LAP Lambert Academic Publishing (08/08/2012)
PRODUCT TYPE Paperback (Paperback)

Description
This book presents the development of a prosody system for text-to-speech (TTS) applications. The prosody is responsible for a communicative intention and guarantees some naturalness in the uttered speech. The prosodic features consist in the imposition of the timing, characterized by the segmental durations and pauses, the intonation, characterized by the fundamental frequency (F0) curve, and by the intensity curve. The proposed prosody model consists of several sub-models, namely, the duration model to predict the segmental durations and the model to predict the F0 pattern. The segmental durations model consists of one ANN carefully selected concerning its architecture and type as well as the set of input features with the objective of minimizing the error between predicted and measured durations. One alternative model, is based on same considerations but uses one dedicated ANN for each phoneme. The alternative model, with dedicated ANNs, improved the final performance. The proposed model to predict the F0 contour is based on the Fujisaki model and consists of two sub-models. One predicts the Phrase Commands parameters and the other predicts the Accent Commands parameters.
Show More
Product Format
Product Details
ISBN-13: 9783659162770
ISBN-10: 3659162779
Binding: Paperback or Softback (Trade Paperback (Us))
Content Language: English
More Product Details
Page Count: 276
Carton Quantity: 30
Product Dimensions: 6.00 x 0.62 x 9.00 inches
Weight: 0.90 pound(s)
Country of Origin: US
Subject Information
BISAC Categories
Technology & Engineering | Electronics - General
Descriptions, Reviews, Etc.
publisher marketing
This book presents the development of a prosody system for text-to-speech (TTS) applications. The prosody is responsible for a communicative intention and guarantees some naturalness in the uttered speech. The prosodic features consist in the imposition of the timing, characterized by the segmental durations and pauses, the intonation, characterized by the fundamental frequency (F0) curve, and by the intensity curve. The proposed prosody model consists of several sub-models, namely, the duration model to predict the segmental durations and the model to predict the F0 pattern. The segmental durations model consists of one ANN carefully selected concerning its architecture and type as well as the set of input features with the objective of minimizing the error between predicted and measured durations. One alternative model, is based on same considerations but uses one dedicated ANN for each phoneme. The alternative model, with dedicated ANNs, improved the final performance. The proposed model to predict the F0 contour is based on the Fujisaki model and consists of two sub-models. One predicts the Phrase Commands parameters and the other predicts the Accent Commands parameters.
Show More
Your Price  $101.32
Paperback