"Personalising synthetic voices for individuals with severe speech impairment"
Sarah Creer PhD thesis - sound files
Chapter 3
- Example 3.1:
Articulatory synthesis (Taken from D. Hill, L. Manzara and C. Schock
(1995) "Real-time articulatory speech-synthesis-by-rules", In
Proceedings of AVIOS 1995, San Jose, CA: USA, September 1995, pp27-44,
via http://pages.cpsc.ucalgary.ca/~hill/papers/avios95/index.htm)
- Example 3.2:
The JSRU/Holmes synthesiser (Taken from S. Lemmety (1999) "Review of
speech synthesis technology" Master's thesis, Helsinki University of
Technology, via http://www.acoustics.hut.fi/publications/files/theses/lemmetty_mst/appa.html
- Example 3.3:
The KlatTalk system, introduced by D. Klatt (Taken from D. Klatt
(1987), "Review of text-to-speech conversion for English" Journal of
the Acoustical Society of America 82, 737-793, via http://festvox.org/history/klatt.html)
- Example 3.4:
Several DECTalk voices, introduced by D. Klatt (Taken from D. Klatt
(1987), "Review of text-to-speech conversion for English" Journal of
the Acoustical Society of America 82, 737-793, via http://festvox.org/history/klatt.html)
- Example 3.5: Limited domain synthesis from Festival (Taken from http://festvox.org/ldom/ldom_time.html)
- Example 3.6: General Festival concatenative synthesis (Taken from http://www.cs.cmu.edu/~awb/festival_demos/general.html)
- Example 3.7: The author's original speech for comparison with examples 3.8, 3.9 and 3.13.
- Example 3.8: Festvox voice built with Arctic set A (593 sentences/80 minutes) of the author's recorded data.
- Example 3.9: ModelTalker voice built with approximately 1800 utterances (40-50 minutes) of the author's speech.
- Example 3.10: Speaker 1's original speech for comparison with examples 3.11 and 3.12.
- Example 3.11:
Festvox voice built with Arctic set A (593 sentences/80 minutes) of
speaker 1's data, showing where concatenative synthesis can break down.
Please listen to the sound file then click on the footnote to see what
the voice was saying[1]. A different example of this speaker's voice is found as example 3.11.
- Example 3.12: Festvox voice built with Arctic set A (593 sentences/80 minutes) of speaker 1's data.
- Example 3.13: HTS voice built with Arctic set A (593 sentences/80 minutes) of the author's data.
Chapter 4
- Example 4.1: Original speech taken from speaker 5 showing the difficulties in labelling dysarthric speech.
Chapter 5
- Example 5.1: Speaker 1's original speech.
- Example 5.2: Speaker 2's original speech.
- Example 5.3: Average voice from which the models were adapted towards the target speakers.
- Example 5.4: Experiment stimuli using 10 sentences of adaptation data taken from speaker 1.
- Example 5.5: Experiment stimuli using 100 sentences of adaptation data taken from speaker 1.
- Example 5.6: Experiment stimuli using 500 sentences of adaptation data taken from speaker 1.
- Example 5.7: Experiment stimuli resynthesised original target speech from speaker 1.
- Example 5.8: Average voice from which the models were adapted towards the target speakers.
- Example 5.9: Experiment stimuli using 10 sentences of adaptation data taken from speaker 2.
- Example 5.10: Experiment stimuli using 100 sentences of adaptation data taken from speaker 2.
- Example 5.11: Experiment stimuli using 500 sentences of adaptation data taken from speaker 2.
- Example 5.12: Experiment stimuli resynthesised original target speech from speaker 2.
Chapter 6
- Example 6.1:
Original speech taken from speaker 3. Please listen to the sound file
then click on the footnote if you are having difficulty understanding
the speech[3].
- Example 6.2:
Original speech taken from speaker 4. Please listen to the sound file
then click on the footnote if you are having difficulty understanding
the speech[4].
- Example 6.3:
Original speech taken from speaker 5. Please listen to the sound file
then click on the footnote if you are having difficulty understanding
the speech[5].
- Example 6.4:
Speaker 5's synthesised voice built with unedited data and using all
his own voice features. Please listen to this sound file and example
6.5 then click on the footnote if you are having difficulty
understanding the speech[2].
- Example 6.5:
Speaker 5's synthesised voice built with edited data and using all his
own voice features. Please listen to the sound file then click on the
footnote if you are having difficulty understanding the speech[2].
- Example 6.6: Example of the Acapela voice, Peter, with which the speaker participants compared their own synthesised voices.
- Example 6.7:
Voice built with average voice features and speaker 3's own spectral
features and log F0, using edited data. Please listen to the sound file
then click on the footnote if you are having difficulty understanding
the speech[6].
- Example 6.8:
Voice built with average voice features and speaker 4's own spectral
features and log F0, using edited data. Please listen to the sound file
then click on the footnote if you are having difficulty understanding
the speech[7].
- Example 6.9:
Voice built with average voice features and speaker 5's own spectral
features and log F0, using edited data. Please listen to the sound file
then click on the footnote if you are having difficulty understanding
the speech[8].
- Example 6.10:
Voice built with average voice features and speaker 5's own spectral
features and log F0, using unedited data (for comparison with example
6.9). Please listen to the sound file then click on the footnote if you
are having difficulty understanding the speech[8].
- Example 6.11:
Voice built with average voice features and speaker 3's own spectral
features, log F0 and energy (for comparison with example 6.7). Please
listen to the sound file then click on the footnote if you are having
difficulty understanding the speech[6].
- Example 6.12:
Voice built with average voice features and speaker 4's own spectral
features, log F0 and energy (for comparison with example 6.8). Please
listen to the sound file then click on the footnote if you are having
difficulty understanding the speech[7].
- Example 6.13:
Voice built with average voice features and speaker 3's own spectral
features, log F0 and global variance for log F0 (for comparison with
example 6.7). Please listen to the sound file then click on the
footnote if you are having difficulty understanding the speech[6].
- Example 6.14:
Voice built with average voice features and speaker 4's own spectral
features, log F0 and global variance for log F0 (for comparison with
example 6.8). Please listen to the sound file then click on the
footnote if you are having difficulty understanding the speech[7].
- Example 6.15:
Voice built with average voice features and speaker 5's own spectral
features, log F0 and global variance for log F0 (for comparison with
example 6.9). Please listen to the sound file then click on the
footnote if you are having difficulty understanding the speech[8].