103 resultados para Robust speech recognition


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spectral peak resolution was investigated in normal hearing (NH), hearing impaired (HI), and cochlear implant (CI) listeners. The task involved discriminating between two rippled noise stimuli in which the frequency positions of the log-spaced peaks and valleys were interchanged. The ripple spacing was varied adaptively from 0.13 to 11.31 ripples/octave, and the minimum ripple spacing at which a reversal in peak and trough positions could be detected was determined as the spectral peak resolution threshold for each listener. Spectral peak resolution was best, on average, in NH listeners, poorest in CI listeners, and intermediate for HI listeners. There was a significant relationship between spectral peak resolution and both vowel and consonant recognition in quiet across the three listener groups. The results indicate that the degree of spectral peak resolution required for accurate vowel and consonant recognition in quiet backgrounds is around 4 ripples/octave, and that spectral peak resolution poorer than around 1–2 ripples/octave may result in highly degraded speech recognition. These results suggest that efforts to improve spectral peak resolution for HI and CI users may lead to improved speech recognition

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this study was to explore the potential advantages, both theoretical and applied, of preserving low-frequency acoustic hearing in cochlear implant patients. Several hypotheses are presented that predict that residual low-frequency acoustic hearing along with electric stimulation for high frequencies will provide an advantage over traditional long-electrode cochlear implants for the recognition of speech in competing backgrounds. A simulation experiment in normal-hearing subjects demonstrated a clear advantage for preserving low-frequency residual acoustic hearing for speech recognition in a background of other talkers, but not in steady noise. Three subjects with an implanted "short-electrode" cochlear implant and preserved low-frequency acoustic hearing were also tested on speech recognition in the same competing backgrounds and compared to a larger group of traditional cochlear implant users. Each of the three short-electrode subjects performed better than any of the traditional long-electrode implant subjects for speech recognition in a background of other talkers, but not in steady noise, in general agreement with the simulation studies. When compared to a subgroup of traditional implant users matched according to speech recognition ability in quiet, the short-electrode patients showed a 9-dB advantage in the multitalker background. These experiments provide strong preliminary support for retaining residual low-frequency acoustic hearing in cochlear implant patients. The results are consistent with the idea that better perception of voice pitch, which can aid in separating voices in a background of other talkers, was responsible for this advantage.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of the present study was to examine the benefits of providing audible speech to listeners with sensorineural hearing loss when the speech is presented in a background noise. Previous studies have shown that when listeners have a severe hearing loss in the higher frequencies, providing audible speech (in a quiet background) to these higher frequencies usually results in no improvement in speech recognition. In the present experiments, speech was presented in a background of multitalker babble to listeners with various severities of hearing loss. The signal was low-pass filtered at numerous cutoff frequencies and speech recognition was measured as additional high-frequency speech information was provided to the hearing-impaired listeners. It was found in all cases, regardless of hearing loss or frequency range, that providing audible speech resulted in an increase in recognition score. The change in recognition as the cutoff frequency was increased, along with the amount of audible speech information in each condition (articulation index), was used to calculate the "efficiency" of providing audible speech. Efficiencies were positive for all degrees of hearing loss. However, the gains in recognition were small, and the maximum score obtained by an listener was low, due to the noise background. An analysis of error patterns showed that due to the limited speech audibility in a noise background, even severely impaired listeners used additional speech audibility in the high frequencies to improve their perception of the "easier" features of speech including voicing

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents a corpus-based descriptive analysis of the most prevalent transfer effects and connected speech processes observed in a comparison of 11 Vietnamese English speakers (6 females, 5 males) and 12 Australian English speakers (6 males, 6 females) over 24 grammatical paraphrase items. The phonetic processes are segmentally labelled in terms of IPA diacritic features using the EMU speech database system with the aim of labelling departures from native-speaker pronunciation. An analysis of prosodic features was made using ToBI framework. The results show many phonetic and prosodic processes which make non-native speakers’ speech distinct from native ones. The corpusbased methodology of analysing foreign accent may have implications for the evaluation of non-native accent, accented speech recognition and computer assisted pronunciation- learning.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The differences in spectral shape resolution abilities among cochlear implant ~CI! listeners, and between CI and normal-hearing ~NH! listeners, when listening with the same number of channels ~12!, was investigated. In addition, the effect of the number of channels on spectral shape resolution was examined. The stimuli were rippled noise signals with various ripple frequency-spacings. An adaptive 4IFC procedure was used to determine the threshold for resolvable ripple spacing, which was the spacing at which an interchange in peak and valley positions could be discriminated. The results showed poorer spectral shape resolution in CI compared to NH listeners ~average thresholds of approximately 3000 and 400 Hz, respectively!, and wide variability among CI listeners ~range of approximately 800 to 8000 Hz!. There was a significant relationship between spectral shape resolution and vowel recognition. The spectral shape resolution thresholds of NH listeners increased as the number of channels increased from 1 to 16, while the CI listeners showed a performance plateau at 4–6 channels, which is consistent with previous results using speech recognition measures. These results indicate that this test may provide a measure of CI performance which is time efficient and non-linguistic, and therefore, if verified, may provide a useful contribution to the prediction of speech perception in adults and children who use CIs.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

These are the full proceedings of the conference.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Most face recognition systems only work well under quite constrained environments. In particular, the illumination conditions, facial expressions and head pose must be tightly controlled for good recognition performance. In 2004, we proposed a new face recognition algorithm, Adaptive Principal Component Analysis (APCA) [4], which performs well against both lighting variation and expression change. But like other eigenface-derived face recognition algorithms, APCA only performs well with frontal face images. The work presented in this paper is an extension of our previous work to also accommodate variations in head pose. Following the approach of Cootes et al, we develop a face model and a rotation model which can be used to interpret facial features and synthesize realistic frontal face images when given a single novel face image. We use a Viola-Jones based face detector to detect the face in real-time and thus solve the initialization problem for our Active Appearance Model search. Experiments show that our approach can achieve good recognition rates on face images across a wide range of head poses. Indeed recognition rates are improved by up to a factor of 5 compared to standard PCA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There is now considerable evidence to suggest that non-demented people with Parkinson's disease (PD) experience difficulties using the morphosyntactic aspects of language. It remains unclear, however, at precisely which point in the processing of morphosyntax, these difficulties emerge. The major objective of the present study was to examine the impact of PD on the processes involved in accessing morphosyntactic information in the lexicon. Nineteen people with PD and 19 matched control subjects participated in the study which employed on-line word recognition tasks to examine morphosyntactic priming for local grammatical dependencies that occur both within (e.g. is going) and across (e.g. she gives) phrasal boundaries (Experiments 1 and 2, respectively). The control group evidenced robust morphosyntactic priming effects that were consistent with the involvement of both pre- (Experiment 1) and post-lexical (Experiment 2) processing routines. Whilst the participants with PD also recorded priming for dependencies within phrasal boundaries (Experiment 1), priming effects were observed over an abnormally brief time course. Further, in contrast to the controls, the PD group failed to record morphosyntactic priming for constructions that crossed phrasal boundaries (Experiment 2). The results demonstrate that attentionally mediated mechanisms operating at both the pre- and post-lexical stages of processing are able to contribute to morphosyntactic priming effects. In addition, the findings support the notion that, whilst people with PD are able to access morphosyntactic information in a normal manner, the time frame in which this information remains available for processing is altered. Deficits may also be experienced at the post-lexical integrational stage of processing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The McGurk effect, in which auditory [ba] dubbed onto [go] lip movements is perceived as da or tha, was employed in a real-time task to investigate auditory-visual speech perception in prelingual infants. Experiments 1A and 1B established the validity of real-time dubbing for producing the effect. In Experiment 2, 4(1)/(2)-month-olds were tested in a habituation-test paradigm, in which 2 an auditory-visual stimulus was presented contingent upon visual fixation of a live face. The experimental group was habituated to a McGurk stimulus (auditory [ba] visual [ga]), and the control group to matching auditory-visual [ba]. Each group was then presented with three auditory-only test trials, [ba], [da], and [deltaa] (as in then). Visual-fixation durations in test trials showed that the experimental group treated the emergent percept in the McGurk effect, [da] or [deltaa], as familiar (even though they had not heard these sounds previously) and [ba] as novel. For control group infants [da] and [deltaa] were no more familiar than [ba]. These results are consistent with infants'perception of the McGurk effect, and support the conclusion that prelinguistic infants integrate auditory and visual speech information. (C) 2004 Wiley Periodicals, Inc.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional 'mirror neuron' systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a 'virtual' head (Baldi), delivered speech stimuli for identification in auditory, visual or bimodal conditions. Children with ASD were poorer than controls at recognizing stimuli in the unimodal conditions, but once performance on this measure was controlled for, no group difference was found in the bimodal condition. A group of participants with ASD were also trained to develop their speech-reading ability. Training improved visual accuracy and this also improved the children's ability to utilize visual information in their processing of speech. Overall results were compared to predictions from mathematical models based on integration and non-integration, and were most consistent with the integration model. We conclude that, whilst they are less accurate in recognizing stimuli in the unimodal condition, children with ASD show normal integration of visual and auditory speech stimuli. Given that training in recognition of visual speech was effective, children with ASD may benefit from multi-modal approaches in imitative therapy and language training. (C) 2004 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recognising the laterality of a pictured hand involves making an initial decision and confirming that choice by mentally moving one's own hand to match the picture. This depends on an intact body schema. Because patients with complex regional pain syndrome type 1 (CRPS1) take longer to recognise a hand's laterality when it corresponds to their affected hand, it has been proposed that nociceptive input disrupts the body schema. However, chronic pain is associated with physiological and psychosocial complexities that may also explain the results. In three studies, we investigated whether the effect is simply due to nociceptive input. Study one evaluated the temporal and perceptual characteristics of acute hand pain elicited by intramuscular injection of hypertonic saline into the thenar eminence. In studies two and three, subjects performed a hand laterality recognition task before, during, and after acute experimental hand pain, and experimental elbow pain, respectively. During hand pain and during elbow pain, when the laterality of the pictured hand corresponded to the painful side, there was no effect on response time (RT). That suggests that nociceptive input alone is not sufficient to disrupt the working body schema. Conversely to patients with CRPS1, when the laterality of the pictured hand corresponded to the non-painful hand, RT increased similar to 380 ms (95% confidence interval 190 ms-590 ms). The results highlight the differences between acute and chronic pain and may reflect a bias in information processing in acute pain toward the affected part.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of critical challenges in automatic recognition of TV commercials is to generate a unique, robust and compact signature. Uniqueness indicates the ability to identify the similarity among the commercial video clips which may have slight content variation. Robustness means the ability to match commercial video clips containing the same content but probably with different digitalization/encoding, some noise data, and/or transmission and recording distortion. Efficiency is about the capability of effectively matching commercial video sequences with a low computation cost and storage overhead. In this paper, we present a binary signature based method, which meets all the three criteria above, by combining the techniques of ordinal and color measurements. Experimental results on a real large commercial video database show that our novel approach delivers a significantly better performance comparing to the existing methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The primary objective of this study was to assess the lingual kinematic strategies used by younger and older adults to increase rate of speech. It was hypothesised that the strategies used by the older adults would differ from the young adults either as a direct result of, or in response to a need to compensate for, age-related changes in the tongue. Electromagnetic articulography was used to examine the tongue movements of eight young (M526.7 years) and eight older (M567.1 years) females during repetitions of /ta/ and /ka/ at a controlled moderate rate and then as fast as possible. The younger and older adults were found to significantly reduce consonant durations and increase syllable repetition rate by similar proportions. To achieve these reduced durations both groups appeared to use the same strategy, that of reducing the distances travelled by the tongue. Further comparisons at each rate, however, suggested a speed-accuracy trade-off and increased speech monitoring in the older adults. The results may assist in differentiating articulatory changes associated with normal aging from pathological changes found in disorders that affect the older population.