21 resultados para Perceptual Speech Evaluation

em Aston University Research Archive


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Speech comprises dynamic and heterogeneous acoustic elements, yet it is heard as a single perceptual stream even when accompanied by other sounds. The relative contributions of grouping “primitives” and of speech-specific grouping factors to the perceptual coherence of speech are unclear, and the acoustical correlates of the latter remain unspecified. The parametric manipulations possible with simplified speech signals, such as sine-wave analogues, make them attractive stimuli to explore these issues. Given that the factors governing perceptual organization are generally revealed only where competition operates, the second-formant competitor (F2C) paradigm was used, in which the listener must resist competition to optimize recognition [Remez et al., Psychol. Rev. 101, 129-156 (1994)]. Three-formant (F1+F2+F3) sine-wave analogues were derived from natural sentences and presented dichotically (one ear = F1+F2C+F3; opposite ear = F2). Different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, regardless of their amplitude characteristics. In contrast, F2Cs with constant frequency contours were completely ineffective. Competitor efficacy was not due to energetic masking of F3 by F2C. These findings indicate that modulation of the frequency, but not the amplitude, contour is critical for across-formant grouping.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper discusses the first of three studies which collectively represent a convergence of two ongoing research agendas: (1) the empirically-based comparison of the effects of evaluation environment on mobile usability evaluation results; and (2) the effect of environment - in this case lobster fishing boats - on achievable speech-recognition accuracy. We describe, in detail, our study and outline our results to date based on preliminary analysis. Broadly speaking, the potential for effective use of speech for data collection and vessel control looks very promising - surprisingly so! We outline our ongoing analysis and further work.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Six experiments investigated the influence of several grouping cues within the framework of the Verbal Transformation Effect (VTE, Experiments 1 to 4) and Phonemic Transformation Effect (PTE, Experiments 5 and 6), where listening to a repeated word (VTE) or sequence of vowels (PTE) produces verbal transformations (VTs). In Experiment 1, the influence of F0 frequency and lateralization cues (ITDs) was investigated in terms of the pattern of VTs. As the lateralization difference increased between two repeating sequences, the number of forms was significantly reduced with the fewest forms reported in the dichotic condition. Experiment 2 explored whether or not propensity to report more VTs on high pitch was due to the task demands of monitoring two sequences at once. The number of VTs reported was higher when listeners were asked to attend to one sequence only, suggesting smaller attentional constraints on the task requirements. In Experiment 3, consonant-vowel transitions were edited out from two sets of six stimuli words with ‘strong’ and ‘weak’ formant transitions, respectively. Listeners reported more forms in the spliced-out than in the unedited case for the strong-transition words, but not for those with weak transitions. A similar trend was observed for the F0 contour manipulation used in Experiment 4 where listeners reported more VTs and forms for words following a discontinuous F0 contour. In Experiments 5 and 6, the role of F0 frequency and ITD cues was investigated further using a related phenomenon – the PTE. Although these manipulations had relatively little effect on the number of VTs and forms reported, they did influence the particular forms heard. In summary, the current experiments confirmed that it is possible to successfully investigate auditory grouping cues within the VTE framework and that, in agreement with recent studies, the results can be attributed to the perceptual re-grouping of speech sounds.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper discusses the first of three studies which collectively represent a convergence of two ongoing research agendas: (1) the empirically-based comparison of the effects of evaluation environment on mobile usability evaluation results; and (2) the effect of environment - in this case lobster fishing boats - on achievable speech-recognition accuracy. We describe, in detail, our study and outline our results to date based on preliminary analysis. Broadly speaking, the potential for effective use of speech for data collection and vessel control looks very promising - surprisingly so! We outline our ongoing analysis and further work.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This study explored the role of formant transitions and F0-contour continuity in binding together speech sounds into a coherent stream. Listening to a repeating recorded word produces verbal transformations to different forms; stream segregation contributes to this effect and so it can be used to measure changes in perceptual coherence. In experiment 1, monosyllables with strong formant transitions between the initial consonant and following vowel were monotonized; each monosyllable was paired with a weak-transitions counterpart. Further stimuli were derived by replacing the consonant-vowel transitions with samples from adjacent steady portions. Each stimulus was concatenated into a 3-min-long sequence. Listeners only reported more forms in the transitions-removed condition for strong-transitions words, for which formant-frequency discontinuities were substantial. In experiment 2, the F0 contour of all-voiced monosyllables was shaped to follow a rising or falling pattern, spanning one octave. Consecutive tokens either had the same contour, giving an abrupt F0 change between each token, or alternated, giving a continuous contour. Discontinuous sequences caused more transformations and forms, and shorter times to the first transformation. Overall, these findings support the notion that continuity cues provided by formant transitions and the F0 contour play an important role in maintaining the perceptual coherence of speech.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes part of the corpus collection efforts underway in the EC funded Companions project. The Companions project is collecting substantial quantities of dialogue a large part of which focus on reminiscing about photographs. The texts are in English and Czech. We describe the context and objectives for which this dialogue corpus is being collected, the methodology being used and make observations on the resulting data. The corpora will be made available to the wider research community through the Companions Project web site.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

How speech is separated perceptually from other speech remains poorly understood. Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the modulation of its frequency, but not its amplitude, contour. This study further examined the effect of formant-frequency variation on intelligibility by manipulating the rate of formant-frequency change. Target sentences were synthetic three-formant (F1?+?F2?+?F3) analogues of natural utterances. Perceptual organization was probed by presenting stimuli dichotically (F1?+?F2C?+?F3C; F2?+?F3), where F2C?+?F3C constitute a competitor for F2 and F3 that listeners must reject to optimize recognition. Competitors were derived using formant-frequency contours extracted from extended passages spoken by the same talker and processed to alter the rate of formant-frequency variation, such that rate scale factors relative to the target sentences were 0, 0.25, 0.5, 1, 2, and 4 (0?=?constant frequencies). Competitor amplitude contours were either constant, or time-reversed and rate-adjusted in parallel with the frequency contour. Adding a competitor typically reduced intelligibility; this reduction increased with competitor rate until the rate was at least twice that of the target sentences. Similarity in the results for the two amplitude conditions confirmed that formant amplitude contours do not influence across-formant grouping. The findings indicate that competitor efficacy is not tuned to the rate of the target sentences; most probably, it depends primarily on the overall rate of frequency variation in the competitor formants. This suggests that, when segregating the speech of concurrent talkers, differences in speech rate may not be a significant cue for across-frequency grouping of formants.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In an isolated syllable, a formant will tend to be segregated perceptually if its fundamental frequency (F0) differs from that of the other formants. This study explored whether similar results are found for sentences, and specifically whether differences in F0 (?F0) also influence across-formant grouping in circumstances where the exclusion or inclusion of the manipulated formant critically determines speech intelligibility. Three-formant (F1 + F2 + F3) analogues of almost continuously voiced natural sentences were synthesized using a monotonous glottal source (F0 = 150 Hz). Perceptual organization was probed by presenting stimuli dichotically (F1 + F2C + F3; F2), where F2C is a competitor for F2 that listeners must resist to optimize recognition. Competitors were created using time-reversed frequency and amplitude contours of F2, and F0 was manipulated (?F0 = ±8, ±2, or 0 semitones relative to the other formants). Adding F2C typically reduced intelligibility, and this reduction was greatest when ?F0 = 0. There was an additional effect of absolute F0 for F2C, such that competitor efficacy was greater for higher F0s. However, competitor efficacy was not due to energetic masking of F3 by F2C. The results are consistent with the proposal that a grouping “primitive” based on common F0 influences the fusion and segregation of concurrent formants in sentence perception.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

At present there is no standard assessment method for rating and comparing the quality of synthesized speech. This study assesses the suitability of Time Frequency Warping (TFW) modulation for use as a reference device for assessing synthesized speech. Time Frequency Warping modulation introduces timing errors into natural speech that produce perceptual errors similar to those found in synthetic speech. It is proposed that TFW modulation used in conjunction with a listening effort test would provide a standard assessment method for rating the quality of synthesized speech. This study identifies the most suitable TFW modulation variable parameter to be used for assessing synthetic speech and assess the results of several assessment tests that rate examples of synthesized speech in terms of the TFW variable parameter and listening effort. The study also attempts to identify the attributes of speech that differentiate synthetic, TFW modulated and natural speech.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The research presented in this paper is part of an ongoing investigation into how best to support meaningful lab-based evaluations of mobile technologies. In our previous work, we developed a hazard avoidance system for use during lab evaluations [1]; in the work reported here, we further assess the impact of this system, specifically in terms of the effect of avoidance cue type on speech-based text entry tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Speech recognition technology is regarded as a key enabler for increasing the usability of applications deployed on mobile devices -- devices which are becoming increasingly prevalent in modern hospital-based healthcare. Although the use of speech recognition is not new to the hospital-based healthcare domain, its use with mobile devices has thus far been limited. This paper presents the results of a literature review we conducted in order to observe the manner in which speech recognition technology has been used in hospital-based healthcare and to gain an understanding of how this technology is being evaluated, in terms of its dependability and reliability, in healthcare settings. Our intent is that this review will help identify scope for future uses of speech recognition technologies in the healthcare domain, as well as to identify implications for the meaningful evaluation of such technologies given the specific context of use.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The standard reference clinical score quantifying average Parkinson's disease (PD) symptom severity is the Unified Parkinson's Disease Rating Scale (UPDRS). At present, UPDRS is determined by the subjective clinical evaluation of the patient's ability to adequately cope with a range of tasks. In this study, we extend recent findings that UPDRS can be objectively assessed to clinically useful accuracy using simple, self-administered speech tests, without requiring the patient's physical presence in the clinic. We apply a wide range of known speech signal processing algorithms to a large database (approx. 6000 recordings from 42 PD patients, recruited to a six-month, multi-centre trial) and propose a number of novel, nonlinear signal processing algorithms which reveal pathological characteristics in PD more accurately than existing approaches. Robust feature selection algorithms select the optimal subset of these algorithms, which is fed into non-parametric regression and classification algorithms, mapping the signal processing algorithm outputs to UPDRS. We demonstrate rapid, accurate replication of the UPDRS assessment with clinically useful accuracy (about 2 UPDRS points difference from the clinicians' estimates, p < 0.001). This study supports the viability of frequent, remote, cost-effective, objective, accurate UPDRS telemonitoring based on self-administered speech tests. This technology could facilitate large-scale clinical trials into novel PD treatments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

How speech is separated perceptually from other speech remains poorly understood. In a series of experiments, perceptual organisation was probed by presenting three-formant (F1+F2+F3) analogues of target sentences dichotically, together with a competitor for F2 (F2C), or for F2+F3, which listeners must reject to optimise recognition. To control for energetic masking, the competitor was always presented in the opposite ear to the corresponding target formant(s). Sine-wave speech was used initially, and different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, whatever their amplitude characteristics, whereas constant-frequency F2Cs were ineffective. Subsequent studies used synthetic-formant speech to explore the effects of manipulating the rate and depth of formant-frequency change in the competitor. Competitor efficacy was not tuned to the rate of formant-frequency variation in the target sentences; rather, the reduction in intelligibility increased with competitor rate relative to the rate for the target sentences. Therefore, differences in speech rate may not be a useful cue for separating the speech of concurrent talkers. Effects of competitors whose depth of formant-frequency variation was scaled by a range of factors were explored using competitors derived either by inverting the frequency contour of F2 about its geometric mean (plausibly speech-like pattern) or by using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Competitor efficacy depended on the overall depth of frequency variation, not depth relative to that for the other formants. Furthermore, the triangle-wave competitors were as effective as their more speech-like counterparts. Overall, the results suggest that formant-frequency variation is critical for the across-frequency grouping of formants but that this grouping does not depend on speech-specific constraints.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The research presented in this paper is part of an ongoing investigation into how best to support meaningful lab-based evaluations of mobile technologies. In our previous work, we developed a hazard avoidance system for use during lab evaluations [1]; in the work reported here, we further assess the impact of this system, specifically in terms of the effect of avoidance cue type on speech-based text entry tasks.