29 resultados para Audio-visual Speech Recognition, Visual Feature Extraction, Free-parts, Monolithic, ROI
em University of Queensland eSpace - Australia
Resumo:
Spectral peak resolution was investigated in normal hearing (NH), hearing impaired (HI), and cochlear implant (CI) listeners. The task involved discriminating between two rippled noise stimuli in which the frequency positions of the log-spaced peaks and valleys were interchanged. The ripple spacing was varied adaptively from 0.13 to 11.31 ripples/octave, and the minimum ripple spacing at which a reversal in peak and trough positions could be detected was determined as the spectral peak resolution threshold for each listener. Spectral peak resolution was best, on average, in NH listeners, poorest in CI listeners, and intermediate for HI listeners. There was a significant relationship between spectral peak resolution and both vowel and consonant recognition in quiet across the three listener groups. The results indicate that the degree of spectral peak resolution required for accurate vowel and consonant recognition in quiet backgrounds is around 4 ripples/octave, and that spectral peak resolution poorer than around 1–2 ripples/octave may result in highly degraded speech recognition. These results suggest that efforts to improve spectral peak resolution for HI and CI users may lead to improved speech recognition
Resumo:
The purpose of this study was to explore the potential advantages, both theoretical and applied, of preserving low-frequency acoustic hearing in cochlear implant patients. Several hypotheses are presented that predict that residual low-frequency acoustic hearing along with electric stimulation for high frequencies will provide an advantage over traditional long-electrode cochlear implants for the recognition of speech in competing backgrounds. A simulation experiment in normal-hearing subjects demonstrated a clear advantage for preserving low-frequency residual acoustic hearing for speech recognition in a background of other talkers, but not in steady noise. Three subjects with an implanted "short-electrode" cochlear implant and preserved low-frequency acoustic hearing were also tested on speech recognition in the same competing backgrounds and compared to a larger group of traditional cochlear implant users. Each of the three short-electrode subjects performed better than any of the traditional long-electrode implant subjects for speech recognition in a background of other talkers, but not in steady noise, in general agreement with the simulation studies. When compared to a subgroup of traditional implant users matched according to speech recognition ability in quiet, the short-electrode patients showed a 9-dB advantage in the multitalker background. These experiments provide strong preliminary support for retaining residual low-frequency acoustic hearing in cochlear implant patients. The results are consistent with the idea that better perception of voice pitch, which can aid in separating voices in a background of other talkers, was responsible for this advantage.
Resumo:
The purpose of the present study was to examine the benefits of providing audible speech to listeners with sensorineural hearing loss when the speech is presented in a background noise. Previous studies have shown that when listeners have a severe hearing loss in the higher frequencies, providing audible speech (in a quiet background) to these higher frequencies usually results in no improvement in speech recognition. In the present experiments, speech was presented in a background of multitalker babble to listeners with various severities of hearing loss. The signal was low-pass filtered at numerous cutoff frequencies and speech recognition was measured as additional high-frequency speech information was provided to the hearing-impaired listeners. It was found in all cases, regardless of hearing loss or frequency range, that providing audible speech resulted in an increase in recognition score. The change in recognition as the cutoff frequency was increased, along with the amount of audible speech information in each condition (articulation index), was used to calculate the "efficiency" of providing audible speech. Efficiencies were positive for all degrees of hearing loss. However, the gains in recognition were small, and the maximum score obtained by an listener was low, due to the noise background. An analysis of error patterns showed that due to the limited speech audibility in a noise background, even severely impaired listeners used additional speech audibility in the high frequencies to improve their perception of the "easier" features of speech including voicing
Resumo:
Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional 'mirror neuron' systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a 'virtual' head (Baldi), delivered speech stimuli for identification in auditory, visual or bimodal conditions. Children with ASD were poorer than controls at recognizing stimuli in the unimodal conditions, but once performance on this measure was controlled for, no group difference was found in the bimodal condition. A group of participants with ASD were also trained to develop their speech-reading ability. Training improved visual accuracy and this also improved the children's ability to utilize visual information in their processing of speech. Overall results were compared to predictions from mathematical models based on integration and non-integration, and were most consistent with the integration model. We conclude that, whilst they are less accurate in recognizing stimuli in the unimodal condition, children with ASD show normal integration of visual and auditory speech stimuli. Given that training in recognition of visual speech was effective, children with ASD may benefit from multi-modal approaches in imitative therapy and language training. (C) 2004 Elsevier Ltd. All rights reserved.
Resumo:
Capacity limits in visual attention have traditionally been studied using static arrays of elements from which an observer must detect a target defined by a certain visual feature or combination of features. In the current study we use this visual search paradigm, with accuracy as the dependent variable, to examine attentional capacity limits for different visual features undergoing change over time. In Experiment 1, detectability of a single changing target was measured under conditions where the type of change (size, speed, colour), the magnitude of change, the set size and homogeneity of the unchanging distractors were all systematically varied. Psychometric function slopes were calculated for different experimental conditions and ‘change thresholds’extracted from these slopes were used in Experiment 2, in which multiple supra-threshold changes were made, simultaneously, either to a single or to two or three different stimulus elements. These experiments give an objective psychometric paradigm for measuring changes in visual features over time. Results favour object-based accounts of visual attention, and show consistent differences in the allocation of attentional capacity to different perceptual dimensions.
Resumo:
In this paper, we present a novel indexing technique called Multi-scale Similarity Indexing (MSI) to index imagersquos multi-features into a single one-dimensional structure. Both for text and visual feature spaces, the similarity between a point and a local partitionrsquos center in individual space is used as the indexing key, where similarity values in different features are distinguished by different scale. Then a single indexing tree can be built on these keys. Based on the property that relevant images haves similar similarity values from the center of the same local partition in any feature space, certain number of irrelevant images can be fast pruned based on the triangle inequity on indexing keys. To remove the ldquodimensionality curserdquo existing in high dimensional structure, we propose a new technique called Local Bit Stream (LBS). LBS transforms imagersquos text and visual feature representations into simple, uniform and effective bit stream (BS) representations based on local partitionrsquos center. Such BS representations are small in size and fast for comparison since only bit operation are involved. By comparing common bits existing in two BSs, most of irrelevant images can be immediately filtered. Our extensive experiment showed that single one-dimensional index on multi-features improves multi-indices on multi-features greatly. Our LBS method outperforms sequential scan on high dimensional space by an order of magnitude.
Resumo:
Single shortest path extraction algorithms have been used in a number of areas such as network flow and image analysis. In image analysis, shortest path techniques can be used for object boundary detection, crack detection, or stereo disparity estimation. Sometimes one needs to find multiple paths as opposed to a single path in a network or an image where the paths must satisfy certain constraints. In this paper, we propose a new algorithm to extract multiple paths simultaneously within an image using a constrained expanded trellis (CET) for feature extraction and object segmentation. We also give a number of application examples for our multiple paths extraction algorithm.
Resumo:
In this paper, we present a novel indexing technique called Multi-scale Similarity Indexing (MSI) to index image's multi-features into a single one-dimensional structure. Both for text and visual feature spaces, the similarity between a point and a local partition's center in individual space is used as the indexing key, where similarity values in different features are distinguished by different scale. Then a single indexing tree can be built on these keys. Based on the property that relevant images have similar similarity values from the center of the same local partition in any feature space, certain number of irrelevant images can be fast pruned based on the triangle inequity on indexing keys. To remove the dimensionality curse existing in high dimensional structure, we propose a new technique called Local Bit Stream (LBS). LBS transforms image's text and visual feature representations into simple, uniform and effective bit stream (BS) representations based on local partition's center. Such BS representations are small in size and fast for comparison since only bit operation are involved. By comparing common bits existing in two BSs, most of irrelevant images can be immediately filtered. To effectively integrate multi-features, we also investigated the following evidence combination techniques-Certainty Factor, Dempster Shafer Theory, Compound Probability, and Linear Combination. Our extensive experiment showed that single one-dimensional index on multi-features improves multi-indices on multi-features greatly. Our LBS method outperforms sequential scan on high dimensional space by an order of magnitude. And Certainty Factor and Dempster Shafer Theory perform best in combining multiple similarities from corresponding multiple features.
Resumo:
Objective: The description and evaluation of the performance of a new real-time seizure detection algorithm in the newborn infant. Methods: The algorithm includes parallel fragmentation of EEG signal into waves; wave-feature extraction and averaging; elementary, preliminary and final detection. The algorithm detects EEG waves with heightened regularity, using wave intervals, amplitudes and shapes. The performance of the algorithm was assessed with the use of event-based and liberal and conservative time-based approaches and compared with the performance of Gotman's and Liu's algorithms. Results: The algorithm was assessed on multi-channel EEG records of 55 neonates including 17 with seizures. The algorithm showed sensitivities ranging 83-95% with positive predictive values (PPV) 48-77%. There were 2.0 false positive detections per hour. In comparison, Gotman's algorithm (with 30 s gap-closing procedure) displayed sensitivities of 45-88% and PPV 29-56%; with 7.4 false positives per hour and Liu's algorithm displayed sensitivities of 96-99%, and PPV 10-25%; with 15.7 false positives per hour. Conclusions: The wave-sequence analysis based algorithm displayed higher sensitivity, higher PPV and a substantially lower level of false positives than two previously published algorithms. Significance: The proposed algorithm provides a basis for major improvements in neonatal seizure detection and monitoring. Published by Elsevier Ireland Ltd. on behalf of International Federation of Clinical Neurophysiology.
Resumo:
Lots of work has been done in texture feature extraction for rectangular images, but not as much attention has been paid to the arbitrary-shaped regions available in region-based image retrieval (RBIR) systems. In This work, we present a texture feature extraction algorithm, based on projection onto convex sets (POCS) theory. POCS iteratively concentrates more and more energy into the selected coefficients from which texture features of an arbitrary-shaped region can be extracted. Experimental results demonstrate the effectiveness of the proposed algorithm for image retrieval purposes.
Resumo:
This paper presents a corpus-based descriptive analysis of the most prevalent transfer effects and connected speech processes observed in a comparison of 11 Vietnamese English speakers (6 females, 5 males) and 12 Australian English speakers (6 males, 6 females) over 24 grammatical paraphrase items. The phonetic processes are segmentally labelled in terms of IPA diacritic features using the EMU speech database system with the aim of labelling departures from native-speaker pronunciation. An analysis of prosodic features was made using ToBI framework. The results show many phonetic and prosodic processes which make non-native speakers’ speech distinct from native ones. The corpusbased methodology of analysing foreign accent may have implications for the evaluation of non-native accent, accented speech recognition and computer assisted pronunciation- learning.
Resumo:
The differences in spectral shape resolution abilities among cochlear implant ~CI! listeners, and between CI and normal-hearing ~NH! listeners, when listening with the same number of channels ~12!, was investigated. In addition, the effect of the number of channels on spectral shape resolution was examined. The stimuli were rippled noise signals with various ripple frequency-spacings. An adaptive 4IFC procedure was used to determine the threshold for resolvable ripple spacing, which was the spacing at which an interchange in peak and valley positions could be discriminated. The results showed poorer spectral shape resolution in CI compared to NH listeners ~average thresholds of approximately 3000 and 400 Hz, respectively!, and wide variability among CI listeners ~range of approximately 800 to 8000 Hz!. There was a significant relationship between spectral shape resolution and vowel recognition. The spectral shape resolution thresholds of NH listeners increased as the number of channels increased from 1 to 16, while the CI listeners showed a performance plateau at 4–6 channels, which is consistent with previous results using speech recognition measures. These results indicate that this test may provide a measure of CI performance which is time efficient and non-linguistic, and therefore, if verified, may provide a useful contribution to the prediction of speech perception in adults and children who use CIs.
Resumo:
In this letter, we propose a class of self-stabilizing learning algorithms for minor component analysis (MCA), which includes a few well-known MCA learning algorithms. Self-stabilizing means that the sign of the weight vector length change is independent of the presented input vector. For these algorithms, rigorous global convergence proof is given and the convergence rate is also discussed. By combining the positive properties of these algorithms, a new learning algorithm is proposed which can improve the performance. Simulations are employed to confirm our theoretical results.
Resumo:
These are the full proceedings of the conference.