891 resultados para Speech segmentation
Resumo:
Background: Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. Methods: Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. Results: Both aphasic patients and healthy controls mainly fixated the speaker’s face. We found a significant co-speech gesture x ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction x ROI x group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker’s face compared to healthy controls. Conclusion: Co-speech gestures guide the observer’s attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker’s face. Keywords: Gestures, visual exploration, dialogue, aphasia, apraxia, eye movements
Clinical Evaluation of a Fully-automatic Segmentation Method for Longitudinal Brain Tumor Volumetry.
Resumo:
Information about the size of a tumor and its temporal evolution is needed for diagnosis as well as treatment of brain tumor patients. The aim of the study was to investigate the potential of a fully-automatic segmentation method, called BraTumIA, for longitudinal brain tumor volumetry by comparing the automatically estimated volumes with ground truth data acquired via manual segmentation. Longitudinal Magnetic Resonance (MR) Imaging data of 14 patients with newly diagnosed glioblastoma encompassing 64 MR acquisitions, ranging from preoperative up to 12 month follow-up images, was analysed. Manual segmentation was performed by two human raters. Strong correlations (R = 0.83-0.96, p < 0.001) were observed between volumetric estimates of BraTumIA and of each of the human raters for the contrast-enhancing (CET) and non-enhancing T2-hyperintense tumor compartments (NCE-T2). A quantitative analysis of the inter-rater disagreement showed that the disagreement between BraTumIA and each of the human raters was comparable to the disagreement between the human raters. In summary, BraTumIA generated volumetric trend curves of contrast-enhancing and non-enhancing T2-hyperintense tumor compartments comparable to estimates of human raters. These findings suggest the potential of automated longitudinal tumor segmentation to substitute manual volumetric follow-up of contrast-enhancing and non-enhancing T2-hyperintense tumor compartments.
Resumo:
The objective of this retrospective study is to follow up on a previous Dynamic Smile Analysis and videographically analyze and develop averages for soft tissue norms with respect to the display of dentition during speech. These values would then be compared cross-sectionally across different age groups to see whether changes attributable to the aging process could be seen. A secondary objective was to compare averages for soft tissue norms in the display of dentition during speech to averages for soft tissue norms in the display of dentition during the smile. Materials and Method: Records from a previous study in which video equipment was used to capture video for 26 1 subjects were re-evaluated to find appropriate frames to analyze for speech. Two frames for each subject were selected; one frame representing the maximal display of maxillary incisors during speech and the second representing the widest transverse display of dentition during speech. After excluding 40 subjects the data for the remaining 221 subjects was analyzed. These averages were then compared to averages attained in the previous study to compare the display of the dentition during speech to the display of the dentition during smile. Results: On average, a difference in 1.29 mm was seen in the display of the maxillary incisors during speech at maximal display and during the smile. An average of 7.23 mm of maxillary incisors is readily visible during maximum display of maxillary incisors during speech, as compared to 8.52 mm during the smile. The constructed smile index was also smaller when measured during the speech when compared to the smile index by an average of 2.58 units. Conclusion: This study helps to establish age-related dynamic norms for the display of dentition during speech. The dynamic measures indicate that the display of dectition is greater, on average, during the smile than at speech.
Resumo:
A shortage of bilingual/bicultural speech language pathologists may reflect a problem with recruitment and retention of bilingual/bicultural students. The purpose of the present study was to survey graduate training programs in speech language pathology to determine typical policies and practices concerning students who apply and are admitted as ELLs. With a growing number of ELL children needing services from a bilingual SLP, it seems that little is being done to address the issue. The problem may be with the reluctance of programs to not only accept ELL students, but there also seems to be a disinclination for any sort of training program to be established for these ELL students. Clinic directors were asked to complete a survey about ELLs seeking clinical training in speech language pathology. In particular, we were interested in obtaining information about whether clinical training programs a) provided opportunities for ELL to participate in clinic, b) assessed the English skills of these students, and c) provided remediation if these students English skills were judged to be less than proficient.
Resumo:
by Adolf Hitler
Resumo:
Recently, vision-based advanced driver-assistance systems (ADAS) have received a new increased interest to enhance driving safety. In particular, due to its high performance–cost ratio, mono-camera systems are arising as the main focus of this field of work. In this paper we present a novel on-board road modeling and vehicle detection system, which is a part of the result of the European I-WAY project. The system relies on a robust estimation of the perspective of the scene, which adapts to the dynamics of the vehicle and generates a stabilized rectified image of the road plane. This rectified plane is used by a recursive Bayesian classi- fier, which classifies pixels as belonging to different classes corresponding to the elements of interest of the scenario. This stage works as an intermediate layer that isolates subsequent modules since it absorbs the inherent variability of the scene. The system has been tested on-road, in different scenarios, including varied illumination and adverse weather conditions, and the results have been proved to be remarkable even for such complex scenarios.
Resumo:
This paper describes the development of an Advanced Speech Communication System for Deaf People and its field evaluation in a real application domain: the renewal of Driver’s License. The system is composed of two modules. The first one is a Spanish into Spanish Sign Language (LSE: Lengua de Signos Española) translation module made up of a speech recognizer, a natural language translator (for converting a word sequence into a sequence of signs), and a 3D avatar animation module (for playing back the signs). The second module is a Spoken Spanish generator from sign-writing composed of a visual interface (for specifying a sequence of signs), a language translator (for generating the sequence of words in Spanish), and finally, a text to speech converter. For language translation, the system integrates three technologies: an example-based strategy, a rule-based translation method and a statistical translator. This paper also includes a detailed description of the evaluation carried out in the Local Traffic Office in the city of Toledo (Spain) involving real government employees and deaf people. This evaluation includes objective measurements from the system and subjective information from questionnaires. Finally, the paper reports an analysis of the main problems and a discussion about possible solutions.
Resumo:
In the last two decades, there has been an important increase in research on speech technology in Spain, mainly due to a higher level of funding from European, Spanish and local institutions and also due to a growing interest in these technologies for developing new services and applications. This paper provides a review of the main areas of speech technology addressed by research groups in Spain, their main contributions in the recent years and the main focus of interest these days. This description is classified in five main areas: audio processing including speech, speaker characterization, speech and language processing, text to speech conversion and spoken language applications. This paper also introduces the Spanish Network of Speech Technologies (RTTH. Red Temática en Tecnologías del Habla) as the research network that includes almost all the researchers working in this area, presenting some figures, its objectives and its main activities developed in the last years.
Resumo:
Several issues concerning the current use of speech interfaces are discussed and the design and development of a speech interface that enables air traffic controllers to command and control their terminals by voice is presented. A special emphasis is made in the comparison between laboratory experiments and field experiments in which a set of ergonomics-related effects are detected that cannot be observed in the controlled laboratory experiments. The paper presents both objective and subjective performance obtained in field evaluation of the system with student controllers at an air traffic control (ATC) training facility. The system exhibits high word recognition test rates (0.4% error in Spanish and 1.5% in English) and low command error (6% error in Spanish and 10.6% error in English in the field tests). Subjective impression has also been positive, encouraging future development and integration phases in the Spanish ATC terminals designed by Aeropuertos Españoles y Navegación Aérea (AENA).
Resumo:
Although there has been a lot of interest in recognizing and understanding air traffic control (ATC) speech, none of the published works have obtained detailed field data results. We have developed a system able to identify the language spoken and recognize and understand sentences in both Spanish and English. We also present field results for several in-tower controller positions. To the best of our knowledge, this is the first time that field ATC speech (not simulated) is captured, processed, and analyzed. The use of stochastic grammars allows variations in the standard phraseology that appear in field data. The robust understanding algorithm developed has 95% concept accuracy from ATC text input. It also allows changes in the presentation order of the concepts and the correction of errors created by the speech recognition engine improving it by 17% and 25%, respectively, absolute in the percentage of fully correctly understood sentences for English and Spanish in relation to the percentages of fully correctly recognized sentences. The analysis of errors due to the spontaneity of the speech and its comparison to read speech is also carried out. A 96% word accuracy for read speech is reduced to 86% word accuracy for field ATC data for Spanish for the "clearances" task confirming that field data is needed to estimate the performance of a system. A literature review and a critical discussion on the possibilities of speech recognition and understanding technology applied to ATC speech are also given.
Resumo:
This work is part of an on-going collaborative project between the medical and signal processing communities to promote new research efforts on automatic OSA (Obstructive Apnea Syndrome) diagnosis. In this paper, we explore the differences noted in phonetic classes (interphoneme) across groups (control/apnoea) and analyze their utility for OSA detection
Resumo:
Speech Technologies can provide important benefits for the development of more usable and safe in-vehicle human-machine interactive systems (HMIs). However mainly due robustness issues, the use of spoken interaction can entail important distractions to the driver. In this challenging scenario, while speech technologies are evolving, further research is necessary to explore how they can be complemented with both other modalities (multimodality) and information from the increasing number of available sensors (context-awareness). The perceived quality of speech technologies can significantly be increased by implementing such policies, which simply try to make the best use of all the available resources; and the in vehicle scenario is an excellent test-bed for this kind of initiatives. In this contribution we propose an event-based HMI design framework which combines context modelling and multimodal interaction using a W3C XML language known as SCXML. SCXML provides a general process control mechanism that is being considered by W3C to improve both voice interaction (VoiceXML) and multimodal interaction (MMI). In our approach we try to anticipate and extend these initiatives presenting a flexible SCXML-based approach for the design of a wide range of multimodal context-aware HMI in-vehicle interfaces. The proposed framework for HMI design and specification has been implemented in an automotive OSGi service platform, and it is being used and tested in the Spanish research project MARTA for the development of several in-vehicle interactive applications.
Resumo:
Advanced liver surgery requires a precise pre-operative planning, where liver segmentation and remnant liver volume are key elements to avoid post-operative liver failure. In that context, level-set algorithms have achieved better results than others, especially with altered liver parenchyma or in cases with previous surgery. In order to improve functional liver parenchyma volume measurements, in this work we propose two strategies to enhance previous level-set algorithms: an optimal multi-resolution strategy with fine details correction and adaptive curvature, as well as an additional semiautomatic step imposing local curvature constraints. Results show more accurate segmentations, especially in elongated structures, detecting internal lesions and avoiding leakages to close structures
Resumo:
This paper proposes the use of Factored Translation Models (FTMs) for improving a Speech into Sign Language Translation System. These FTMs allow incorporating syntactic-semantic information during the translation process. This new information permits to reduce significantly the translation error rate. This paper also analyses different alternatives for dealing with the non-relevant words. The speech into sign language translation system has been developed and evaluated in a specific application domain: the renewal of Identity Documents and Driver’s License. The translation system uses a phrase-based translation system (Moses). The evaluation results reveal that the BLEU (BiLingual Evaluation Understudy) has improved from 69.1% to 73.9% and the mSER (multiple references Sign Error Rate) has been reduced from 30.6% to 24.8%.