4 resultados para perceptual listening test
em Aston University Research Archive
Resumo:
At present there is no standard assessment method for rating and comparing the quality of synthesized speech. This study assesses the suitability of Time Frequency Warping (TFW) modulation for use as a reference device for assessing synthesized speech. Time Frequency Warping modulation introduces timing errors into natural speech that produce perceptual errors similar to those found in synthetic speech. It is proposed that TFW modulation used in conjunction with a listening effort test would provide a standard assessment method for rating the quality of synthesized speech. This study identifies the most suitable TFW modulation variable parameter to be used for assessing synthetic speech and assess the results of several assessment tests that rate examples of synthesized speech in terms of the TFW variable parameter and listening effort. The study also attempts to identify the attributes of speech that differentiate synthetic, TFW modulated and natural speech.
Resumo:
Six experiments investigated the influence of several grouping cues within the framework of the Verbal Transformation Effect (VTE, Experiments 1 to 4) and Phonemic Transformation Effect (PTE, Experiments 5 and 6), where listening to a repeated word (VTE) or sequence of vowels (PTE) produces verbal transformations (VTs). In Experiment 1, the influence of F0 frequency and lateralization cues (ITDs) was investigated in terms of the pattern of VTs. As the lateralization difference increased between two repeating sequences, the number of forms was significantly reduced with the fewest forms reported in the dichotic condition. Experiment 2 explored whether or not propensity to report more VTs on high pitch was due to the task demands of monitoring two sequences at once. The number of VTs reported was higher when listeners were asked to attend to one sequence only, suggesting smaller attentional constraints on the task requirements. In Experiment 3, consonant-vowel transitions were edited out from two sets of six stimuli words with ‘strong’ and ‘weak’ formant transitions, respectively. Listeners reported more forms in the spliced-out than in the unedited case for the strong-transition words, but not for those with weak transitions. A similar trend was observed for the F0 contour manipulation used in Experiment 4 where listeners reported more VTs and forms for words following a discontinuous F0 contour. In Experiments 5 and 6, the role of F0 frequency and ITD cues was investigated further using a related phenomenon – the PTE. Although these manipulations had relatively little effect on the number of VTs and forms reported, they did influence the particular forms heard. In summary, the current experiments confirmed that it is possible to successfully investigate auditory grouping cues within the VTE framework and that, in agreement with recent studies, the results can be attributed to the perceptual re-grouping of speech sounds.
Resumo:
Listening is typically the first language skill to develop in first language (L1) users and has been recognized as a basic and fundamental tool for communication. Despite the importance of listening, aural abilities are often taken for granted, and many people overlook their dependency on listening and the complexities that combine to enable this multi-faceted skill. When second language (L2) students are learning their new language, listening is crucial, as it provides access to oral input and facilitates social interaction. Yet L2 students find listening challenging, and L2 teachers often lack sufficient pedagogy to help learners develop listening abilities that they can use in and beyond the classroom. In an effort to provide a pedagogic alternative to more traditional and limited L2 listening instruction, this thesis investigated the viability of listening strategy instruction (LSI) over three semesters at a private university in Japan through a qualitative action research (AR) intervention. An LSI program was planned and implemented with six classes over the course of three AR phases. Two teachers used the LSI with 121 learners throughout the project. Following each AR phase, student and teacher perceptions of the methodology were investigated via questionnaires and interviews, which were primary data collection methods. Secondary research methods (class observations, pre/post-semester test scores, and a research journal) supplemented the primary methods. Data were analyzed and triangulated for emerging themes related to participants’ perceptions of LSI and the viability thereof. These data showed consistent positive perceptions of LSI on the parts of both learners and teachers, although some aspects of LSI required additional refinement. This project provided insights on LSI specific to the university context in Japan and also produced principles for LSI program planning and implementation that can inform the broader L2 education community.
Resumo:
This study explored the role of formant transitions and F0-contour continuity in binding together speech sounds into a coherent stream. Listening to a repeating recorded word produces verbal transformations to different forms; stream segregation contributes to this effect and so it can be used to measure changes in perceptual coherence. In experiment 1, monosyllables with strong formant transitions between the initial consonant and following vowel were monotonized; each monosyllable was paired with a weak-transitions counterpart. Further stimuli were derived by replacing the consonant-vowel transitions with samples from adjacent steady portions. Each stimulus was concatenated into a 3-min-long sequence. Listeners only reported more forms in the transitions-removed condition for strong-transitions words, for which formant-frequency discontinuities were substantial. In experiment 2, the F0 contour of all-voiced monosyllables was shaped to follow a rising or falling pattern, spanning one octave. Consecutive tokens either had the same contour, giving an abrupt F0 change between each token, or alternated, giving a continuous contour. Discontinuous sequences caused more transformations and forms, and shorter times to the first transformation. Overall, these findings support the notion that continuity cues provided by formant transitions and the F0 contour play an important role in maintaining the perceptual coherence of speech.