Evaluating Text-to-Speech (TTS) Synthesis for use in Computer-Assisted Language Learning (CALL)

Author: Zoe Handley
Advisors: Dr M-J Hamel
Prof H Somers
URL: http://www.manchester.ac.uk/~zhandley/thesis.pdf

" target="_blank">

http://www.manchester.ac.uk/~zhandley/thesis.pdf

Completion Date: February 2006
Degree: Ph.D.
Institution: Manchester University, UK
Abstract: Despite the fact that Text-to-Speech (TTS) synthesis has the potential to bring a number of new possibilities to Computer-Assisted Language Learning (CALL), it has not yet made an impact on CALL. It is believed that this is because it has not been adequately evaluated for the purposes. With the aim of validating this claim, an infrastructure for the evaluation of CALL applications integrating TTS synthesis is put forward and evaluations conducted to date are assessed with respect to it. This analysis indicates that TTS synthesis has indeed not been adequately evaluated for the purposes of CALL, specifically, that an important stage in the process of evaluation has been omitted in all evaluations that have been conducted to date, namely requirements analysis. With the aim of developing a benchmark test for the evaluation of the adequacy of TTS synthesis systems for use in CALL applications, this thesis looks to SLA research for indications of what the requirements of TTS synthesis for use in CALL might be. This literature review suggests that CALL applications place demands on the quantity, quality and flexibility of the speech generated by TTS synthesis systems. Regarding the demands that it is suggested that CALL applications place on the quality of the speech generated, in order to validate these requirements, two investigations are carried out. The results of these investigations, which also attempt to determine whether the different roles that TTS synthesis systems may assume in CALL applications impose different requirements on the quality of the speech generated, suggest that, as suggested by the SLA literature, CALL applications do place demands on the comprehensibility, accuracy and naturalness of the speech generated by TTS synthesis, but that, in addition, they also place demands on intelligibility, choice of pronunciation, naturalness of voice, expressiveness and register and that the different roles do indeed place different demands on the quality of the speech generated by TTS synthesis systems, but that teachers and CALL researchers have difficulty differentiating between the different roles and their requirements. It is believed that the results of these investigations imply that evaluations of the adequacy of TTS synthesis systems for use in CALL applications ought to address all of the aspects of the quality of the speech generated by the TTS synthesis systems mentioned above. Regarding the different roles that TTS synthesis systems may assume within CALL applications, it is believed that they imply that, while the different roles do impose different demands, it will not be possible to ask participants to differentiate between these roles and their requirements. Rather, participants can only be asked to rate the quality of the speech generated by TTS synthesis systems for use in CALL applications in general.