30 resultados para Audio-Visual Automatic Speech Recognition
Resumo:
INTRODUCTION The Rondo is a single-unit cochlear implant (CI) audio processor comprising the identical components as its behind-the-ear predecessor, the Opus 2. An interchange of the Opus 2 with the Rondo leads to a shift of the microphone position toward the back of the head. This study aimed to investigate the influence of the Rondo wearing position on speech intelligibility in noise. METHODS Speech intelligibility in noise was measured in 4 spatial configurations with 12 experienced CI users using the German adaptive Oldenburg sentence test. A physical model and a numerical model were used to enable a comparison of the observations. RESULTS No statistically significant differences of the speech intelligibility were found in the situations in which the signal came from the front and the noise came from the frontal, ipsilateral, or contralateral side. The signal-to-noise ratio (SNR) was significantly better with the Opus 2 in the case with the noise presented from the back (4.4 dB, p < 0.001). The differences in the SNR were significantly worse with the Rondo processors placed further behind the ear than closer to the ear. CONCLUSION The study indicates that CI users with the receiver/stimulator implanted in positions further behind the ear are expected to have higher difficulties in noisy situations when wearing the single-unit audio processor.
Resumo:
Research into visual hallucinations has accelerated over the last decade from around 350 publications per year in 2000 to over 500 in 2010. Increased recognition of the frequent occurrence of visual hallucinations in a number of common disorders, coupled with improvements in the measurement of phenomenology, and more sophisticated imaging techniques have allowed the development and initial testing of sophisticated models. However, key questions remain unanswered. Amongst these are: whether there is a satisfactory definition of hallucinations in a constructive visual system; whether there are one, two or several core varieties of hallucinations; what are the underlying brain mechanisms for hallucinations; and what, if anything, can be done to treat them when they lead to distress? Looking across research in several clinical areas suggests a tentative integrative model that allows the possibility of answering these questions, but much work remains to be done.
Resumo:
A new implantable hearing system, the direct acoustic cochlear stimulator (DACS) is presented. This system is based on the principle of a power-driven stapes prosthesis and intended for the treatment of severe mixed hearing loss due to advanced otosclerosis. It consists of an implantable electromagnetic transducer, which transfers acoustic energy directly to the inner ear, and an audio processor worn externally behind the implanted ear. The device is implanted using a specially developed retromeatal microsurgical approach. After removal of the stapes, a conventional stapes prosthesis is attached to the transducer and placed in the oval window to allow direct acoustical coupling to the perilymph of the inner ear. In order to restore the natural sound transmission of the ossicular chain, a second stapes prosthesis is placed in parallel to the first one into the oval window and attached to the patient's own incus, as in a conventional stapedectomy. Four patients were implanted with an investigational DACS device. The hearing threshold of the implanted ears before implantation ranged from 78 to 101 dB (air conduction, pure tone average, 0.5-4 kHz) with air-bone gaps of 33-44 dB in the same frequency range. Postoperatively, substantial improvements in sound field thresholds, speech intelligibility as well as in the subjective assessment of everyday situations were found in all patients. Two years after the implantations, monosyllabic word recognition scores in quiet at 75 dB improved by 45-100 percent points when using the DACS. Furthermore, hearing thresholds were already improved by the second stapes prosthesis alone by 14-28 dB (pure tone average 0.5-4 kHz, DACS switched off). No device-related serious medical complications occurred and all patients have continued to use their device on a daily basis for over 2 years. Copyright (c) 2008 S. Karger AG, Basel.
Resumo:
OBJECTIVE: To develop a novel application of a tool for semi-automatic volume segmentation and adapt it for analysis of fetal cardiac cavities and vessels from heart volume datasets. METHODS: We studied retrospectively virtual cardiac volume cycles obtained with spatiotemporal image correlation (STIC) from six fetuses with postnatally confirmed diagnoses: four with normal hearts between 19 and 29 completed gestational weeks, one with d-transposition of the great arteries and one with hypoplastic left heart syndrome. The volumes were analyzed offline using a commercially available segmentation algorithm designed for ovarian folliculometry. Using this software, individual 'cavities' in a static volume are selected and assigned individual colors in cross-sections and in 3D-rendered views, and their dimensions (diameters and volumes) can be calculated. RESULTS: Individual segments of fetal cardiac cavities could be separated, adjacent segments merged and the resulting electronic casts studied in their spatial context. Volume measurements could also be performed. Exemplary images and interactive videoclips showing the segmented digital casts were generated. CONCLUSION: The approach presented here is an important step towards an automated fetal volume echocardiogram. It has the potential both to help in obtaining a correct structural diagnosis, and to generate exemplary visual displays of cardiac anatomy in normal and structurally abnormal cases for consultation and teaching.
Resumo:
1 Natural soil profiles may be interpreted as an arrangement of parts which are characterized by properties like hydraulic conductivity and water retention function. These parts form a complicated structure. Characterizing the soil structure is fundamental in subsurface hydrology because it has a crucial influence on flow and transport and defines the patterns of many ecological processes. We applied an image analysis method for recognition and classification of visual soil attributes in order to model flow and transport through a man-made soil profile. Modeled and measured saturation-dependent effective parameters were compared. We found that characterizing and describing conductivity patterns in soils with sharp conductivity contrasts is feasible. Differently, solving flow and transport on the basis of these conductivity maps is difficult and, in general, requires special care for representation of small-scale processes.
Resumo:
Effective visual exploration is required for many activities of daily living and instruments to assess visual exploration are important for the evaluation of the visual and the oculomotor system. In this article, the development of a new instrument to measure central and peripheral target recognition is described. The measurement setup consists of a hemispherical projection which allows presenting images over a large area of ±90° horizontal and vertical angle. In a feasibility study with 14 younger (21–49 years) and 12 older (50–78 years) test persons, 132 targets and 24 distractors were presented within naturalistic color photographs of everyday scenes at 10°, 30°, and 50° eccentricity. After the experiment, both younger and older participants reported in a questionnaire that the task is easy to understand, fun and that it measures a competence that is relevant for activities of daily living. A main result of the pilot study was that younger participants recognized more targets with smaller reaction times than older participants. The group differences were most pronounced for peripheral target detection. This test is feasible and appropriate to assess the functional field of view in younger and older adults.
Resumo:
In this paper, we propose novel methodologies for the automatic segmentation and recognition of multi-food images. The proposed methods implement the first modules of a carbohydrate counting and insulin advisory system for type 1 diabetic patients. Initially the plate is segmented using pyramidal mean-shift filtering and a region growing algorithm. Then each of the resulted segments is described by both color and texture features and classified by a support vector machine into one of six different major food classes. Finally, a modified version of the Huang and Dom evaluation index was proposed, addressing the particular needs of the food segmentation problem. The experimental results prove the effectiveness of the proposed method achieving a segmentation accuracy of 88.5% and recognition rate equal to 87%
Resumo:
BACKGROUND: Higher visual functions can be defined as cognitive processes responsible for object recognition, color and shape perception, and motion detection. People with impaired higher visual functions after unilateral brain lesion are often tested with paper pencil tests, but such tests do not assess the degree of interaction between the healthy brain hemisphere and the impaired one. Hence, visual functions are not tested separately in the contralesional and ipsilesional visual hemifields. METHODS: A new measurement setup, that involves real-time comparisons of shape and size of objects, orientation of lines, speed and direction of moving patterns, in the right or left visual hemifield, has been developed. The setup was implemented in an immersive environment like a hemisphere to take into account the effects of peripheral and central vision, and eventual visual field losses. Due to the non-flat screen of the hemisphere, a distortion algorithm was needed to adapt the projected images to the surface. Several approaches were studied and, based on a comparison between projected images and original ones, the best one was used for the implementation of the test. Fifty-seven healthy volunteers were then tested in a pilot study. A Satisfaction Questionnaire was used to assess the usability of the new measurement setup. RESULTS: The results of the distortion algorithm showed a structural similarity between the warped images and the original ones higher than 97%. The results of the pilot study showed an accuracy in comparing images in the two visual hemifields of 0.18 visual degrees and 0.19 visual degrees for size and shape discrimination, respectively, 2.56° for line orientation, 0.33 visual degrees/s for speed perception and 7.41° for recognition of motion direction. The outcome of the Satisfaction Questionnaire showed a high acceptance of the battery by the participants. CONCLUSIONS: A new method to measure higher visual functions in an immersive environment was presented. The study focused on the usability of the developed battery rather than the performance at the visual tasks. A battery of five subtasks to study the perception of size, shape, orientation, speed and motion direction was developed. The test setup is now ready to be tested in neurological patients.
Resumo:
In clinical practice, traditional X-ray radiography is widely used, and knowledge of landmarks and contours in anteroposterior (AP) pelvis X-rays is invaluable for computer aided diagnosis, hip surgery planning and image-guided interventions. This paper presents a fully automatic approach for landmark detection and shape segmentation of both pelvis and femur in conventional AP X-ray images. Our approach is based on the framework of landmark detection via Random Forest (RF) regression and shape regularization via hierarchical sparse shape composition. We propose a visual feature FL-HoG (Flexible- Level Histogram of Oriented Gradients) and a feature selection algorithm based on trace radio optimization to improve the robustness and the efficacy of RF-based landmark detection. The landmark detection result is then used in a hierarchical sparse shape composition framework for shape regularization. Finally, the extracted shape contour is fine-tuned by a post-processing step based on low level image features. The experimental results demonstrate that our feature selection algorithm reduces the feature dimension in a factor of 40 and improves both training and test efficiency. Further experiments conducted on 436 clinical AP pelvis X-rays show that our approach achieves an average point-to-curve error around 1.2 mm for femur and 1.9 mm for pelvis.
Resumo:
BACKGROUND: Co-speech gestures are omnipresent and a crucial element of human interaction by facilitating language comprehension. However, it is unclear whether gestures also support language comprehension in aphasic patients. Using visual exploration behavior analysis, the present study aimed to investigate the influence of congruence between speech and co-speech gestures on comprehension in terms of accuracy in a decision task. METHOD: Twenty aphasic patients and 30 healthy controls watched videos in which speech was either combined with meaningless (baseline condition), congruent, or incongruent gestures. Comprehension was assessed with a decision task, while remote eye-tracking allowed analysis of visual exploration. RESULTS: In aphasic patients, the incongruent condition resulted in a significant decrease of accuracy, while the congruent condition led to a significant increase in accuracy compared to baseline accuracy. In the control group, the incongruent condition resulted in a decrease in accuracy, while the congruent condition did not significantly increase the accuracy. Visual exploration analysis showed that patients fixated significantly less on the face and tended to fixate more on the gesturing hands compared to controls. CONCLUSION: Co-speech gestures play an important role for aphasic patients as they modulate comprehension. Incongruent gestures evoke significant interference and deteriorate patients' comprehension. In contrast, congruent gestures enhance comprehension in aphasic patients, which might be valuable for clinical and therapeutic purposes.
Resumo:
The lexical items like and well can serve as discourse markers (DMs), but can also play numerous other roles, such as verb or adverb. Identifying the occurrences that function as DMs is an important step for language understanding by computers. In this study, automatic classifiers using lexical, prosodic/positional and sociolinguistic features are trained over transcribed dialogues, manually annotated with DM information. The resulting classifiers improve state-of-the-art performance of DM identification, at about 90% recall and 79% precision for like (84.5% accuracy, κ = 0.69), and 99% recall and 98% precision for well (97.5% accuracy, κ = 0.88). Automatic feature analysis shows that lexical collocations are the most reliable indicators, followed by prosodic/positional features, while sociolinguistic features are marginally useful for the identification of DM like and not useful for well. The differentiated processing of each type of DM improves classification accuracy, suggesting that these types should be treated individually.
Resumo:
This article discusses the detection of discourse markers (DM) in dialog transcriptions, by human annotators and by automated means. After a theoretical discussion of the definition of DMs and their relevance to natural language processing, we focus on the role of like as a DM. Results from experiments with human annotators show that detection of DMs is a difficult but reliable task, which requires prosodic information from soundtracks. Then, several types of features are defined for automatic disambiguation of like: collocations, part-of-speech tags and duration-based features. Decision-tree learning shows that for like, nearly 70% precision can be reached, with near 100% recall, mainly using collocation filters. Similar results hold for well, with about 91% precision at 100% recall.
Resumo:
A common finding in time psychophysics is that temporal acuity is much better for auditory than for visual stimuli. The present study aimed to examine modality-specific differences in duration discrimination within the conceptual framework of the Distinct Timing Hypothesis. This theoretical account proposes that durations in the lower milliseconds range are processed automatically while longer durations are processed by a cognitive mechanism. A sample of 46 participants performed two auditory and visual duration discrimination tasks with extremely brief (50-ms standard duration) and longer (1000-ms standard duration) intervals. Better discrimination performance for auditory compared to visual intervals could be established for extremely brief and longer intervals. However, when performance on duration discrimination of longer intervals in the 1-s range was controlled for modality-specific input from the sensory-automatic timing mechanism, the visual-auditory difference disappeared completely as indicated by virtually identical Weber fractions for both sensory modalities. These findings support the idea of a sensory-automatic mechanism underlying the observed visual-auditory differences in duration discrimination of extremely brief intervals in the millisecond range and longer intervals in the 1-s range. Our data are consistent with the notion of a gradual transition from a purely modality-specific, sensory-automatic to a more cognitive, amodal timing mechanism. Within this transition zone, both mechanisms appear to operate simultaneously but the influence of the sensory-automatic timing mechanism is expected to continuously decrease with increasing interval duration.
Resumo:
Purpose In recent years, selective retina laser treatment (SRT), a sub-threshold therapy method, avoids widespread damage to all retinal layers by targeting only a few. While these methods facilitate faster healing, their lack of visual feedback during treatment represents a considerable shortcoming as induced lesions remain invisible with conventional imaging and make clinical use challenging. To overcome this, we present a new strategy to provide location-specific and contact-free automatic feedback of SRT laser applications. Methods We leverage time-resolved optical coherence tomography (OCT) to provide informative feedback to clinicians on outcomes of location-specific treatment. By coupling an OCT system to SRT treatment laser, we visualize structural changes in the retinal layers as they occur via time-resolved depth images. We then propose a novel strategy for automatic assessment of such time-resolved OCT images. To achieve this, we introduce novel image features for this task that when combined with standard machine learning classifiers yield excellent treatment outcome classification capabilities. Results Our approach was evaluated on both ex vivo porcine eyes and human patients in a clinical setting, yielding performances above 95 % accuracy for predicting patient treatment outcomes. In addition, we show that accurate outcomes for human patients can be estimated even when our method is trained using only ex vivo porcine data. Conclusion The proposed technique presents a much needed strategy toward noninvasive, safe, reliable, and repeatable SRT applications. These results are encouraging for the broader use of new treatment options for neovascularization-based retinal pathologies.