961 resultados para Perceptual Speech Evaluation
Resumo:
Objectives. To evaluate whether the overall dysphonia grade, roughness, breathiness, asthenia, and strain (GRBAS) scale, and the Consensus Auditory Perceptual Evaluation-Voice (CAPE-V) scale show the same reliability and consensus when applied to the same vocal sample at different times. Study Design. Observational cross-sectional study. Methods. Sixty subjects had their voices recorded according to the tasks proposed in the CAPE-V scale. Vowels /a/ and /i/ were sustained between 3 and 5 seconds. Reproduction of six sentences and spontaneous speech from the request "Tell me about your voice" were analyzed. For the analysis of the GRBAS scale, the sustained vowel and reading tasks of the sentences was used. Auditory-perceptual voice analyses were conducted by three expert speech therapists with more than 5 years of experience and familiar with both the scales. Results. A strong correlation was observed in the intrajudge consensus analysis, both for the GRBAS scale as well as for CAPE-V, with intraclass coefficient values ranging from 0.923 to 0.985. A high degree of correlation between the general GRBAS and CAPE-V grades (coefficient = 0.842) was observed, with similarities in the grades of dysphonia distribution in both scales. The evaluators indicated a mild difficulty in applying the GRBAS scale and low to mild difficulty in applying the CAPE-V scale. The three evaluators agreed when indicating the GRBAS scale as the fastest and the CAPE-V scale as the most sensitive, especially for detecting small changes in voice. Conclusions. The two scales are reliable and are indicated for use in analyzing voice quality.
Resumo:
It is barely 15 years since, in 1996, the issue theme of Schizophrenia Bulletin (Vol 22, 2) “Early Detection, and Intervention in Schizophrenia” signified the commencement of this field of research. Since that time the field of early detection research has developed rapidly and it may be translated into clinical practice by the introduction of an Attenuated Psychosis Syndrome in Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, (DSM-5) (www.dsm5.org/ProposedRevisions/Pages/proposedrevision.aspx?rid=412#). Attenuated psychotic symptoms (APS) had first been suggested as a clinical predictor of first-episode psychosis by the Personal Assessment and Crisis Evaluation (PACE) Clinic group as part of the ultrahigh risk (UHR) criteria.1 The term ultrahigh risk became broadly accepted for this set of criteria for imminent risk of developing psychosis in the late 1990s. The use of the term “prodrome” for a state characterized by at-risk (AR) criteria was criticized as a retrospective concept inevitably followed by the full-blown disorder.1 Although alternative terms have been suggested, prodrome is still used in prospective studies (eg, prodromally symptomatic, potentially or putatively prodromal, prodrome-like state/symptoms). Some alternative suggestions such as prepsychotic state/symptoms, subthreshold psychotic symptoms, early psychosis, subsyndromal psychosis, hypopsychosis, or subpsychosis were short-lived. Other terms still in use include UHR, at-risk mental state (ARMS), AR, high risk, clinical high risk (CHR), or early and late AR state. Further, the term psychotic-like experiences (PLEs) has recently (re-)entered early detection research. …
Resumo:
Speech melody or prosody subserves linguistic, emotional, and pragmatic functions in speech communication. Prosodic perception is based on the decoding of acoustic cues with a predominant function of frequency-related information perceived as speaker's pitch. Evaluation of prosodic meaning is a cognitive function implemented in cortical and subcortical networks that generate continuously updated affective or linguistic speaker impressions. Various brain-imaging methods allow delineation of neural structures involved in prosody processing. In contrast to functional magnetic resonance imaging techniques, DC (direct current, slow) components of the EEG directly measure cortical activation without temporal delay. Activation patterns obtained with this method are highly task specific and intraindividually reproducible. Studies presented here investigated the topography of prosodic stimulus processing in dependence on acoustic stimulus structure and linguistic or affective task demands, respectively. Data obtained from measuring DC potentials demonstrated that the right hemisphere has a predominant role in processing emotions from the tone of voice, irrespective of emotional valence. However, right hemisphere involvement is modulated by diverse speech and language-related conditions that are associated with a left hemisphere participation in prosody processing. The degree of left hemisphere involvement depends on several factors such as (i) articulatory demands on the perceiver of prosody (possibly, also the poser), (ii) a relative left hemisphere specialization in processing temporal cues mediating prosodic meaning, and (iii) the propensity of prosody to act on the segment level in order to modulate word or sentence meaning. The specific role of top-down effects in terms of either linguistically or affectively oriented attention on lateralization of stimulus processing is not clear and requires further investigations.
Resumo:
Users of cochlear implants (auditory aids, which stimulate the auditory nerve electrically at the inner ear) often suffer from poor speech understanding in noise. We evaluate a small (intermicrophone distance 7 mm) and computationally inexpensive adaptive noise reduction system suitable for behind-the-ear cochlear implant speech processors. The system is evaluated in simulated and real, anechoic and reverberant environments. Results from simulations show improvements of 3.4 to 9.3 dB in signal to noise ratio for rooms with realistic reverberation and more than 18 dB under anechoic conditions. Speech understanding in noise is measured in 6 adult cochlear implant users in a reverberant room, showing average improvements of 7.9–9.6 dB, when compared to a single omnidirectional microphone or 1.3–5.6 dB, when compared to a simple directional two-microphone device. Subjective evaluation in a cafeteria at lunchtime shows a preference of the cochlear implant users for the evaluated device in terms of speech understanding and sound quality.
Resumo:
The aim of this study was to compare the speech in subjects with cleft lip and palate, in whom three methods of the hard palate closure were used. One hundred and thirty-seven children (96 boys, 41 girls; mean age = 12 years, SD = 1·2) with complete unilateral cleft lip and palate (CUCLP) operated by a single surgeon with a one-stage method were evaluated. The management of the cleft lip and soft palate was comparable in all subjects; for hard palate repair, three different methods were used: bilateral von Langenbeck closure (b-vL group, n = 39), unilateral von Langenbeck closure (u-vL group, n = 56) and vomerplasty (v-p group, n = 42). Speech was assessed: (i) perceptually for the presence of a) hypernasality, b) compensatory articulations (CAs), c) audible nasal air emissions (ANE) and d) speech intelligibility; (ii) for the presence of compensatory facial grimacing, (iii) with clinical intra-oral evaluation and (iv) with videonasendoscopy. A total rate of hypernasality requiring pharyngoplasty was 5·1%; total incidence post-oral compensatory articulations (CAs) was 2·2%. The overall speech intelligibility was good in 84·7% of cases. Oronasal fistulas (ONFs) occurred in 15·7% b-vL subjects, 7·1% u-vL subjects and 50% v-p subjects (P < 0·001). No statistically significant intergroup differences for hypernasality, CAs and intelligibility were found (P > 0·1). In conclusion, the speech after early one-stage repair of CUCLP was satisfactory. The method of hard palate repair affected the incidence of ONFs, which, however, caused relatively mild and inconsistent speech errors.
Resumo:
OBJECTIVES The objectives of the present study were to investigate temporal/spectral sound-feature processing in preschool children (4 to 7 years old) with peripheral hearing loss compared with age-matched controls. The results verified the presence of statistical learning, which was diminished in children with hearing impairments (HIs), and elucidated possible perceptual mediators of speech production. DESIGN Perception and production of the syllables /ba/, /da/, /ta/, and /na/ were recorded in 13 children with normal hearing and 13 children with HI. Perception was assessed physiologically through event-related potentials (ERPs) recorded by EEG in a multifeature mismatch negativity paradigm and behaviorally through a discrimination task. Temporal and spectral features of the ERPs during speech perception were analyzed, and speech production was quantitatively evaluated using speech motor maximum performance tasks. RESULTS Proximal to stimulus onset, children with HI displayed a difference in map topography, indicating diminished statistical learning. In later ERP components, children with HI exhibited reduced amplitudes in the N2 and early parts of the late disciminative negativity components specifically, which are associated with temporal and spectral control mechanisms. Abnormalities of speech perception were only subtly reflected in speech production, as the lone difference found in speech production studies was a mild delay in regulating speech intensity. CONCLUSIONS In addition to previously reported deficits of sound-feature discriminations, the present study results reflect diminished statistical learning in children with HI, which plays an early and important, but so far neglected, role in phonological processing. Furthermore, the lack of corresponding behavioral abnormalities in speech production implies that impaired perceptual capacities do not necessarily translate into productive deficits.
Resumo:
Introduction Language is the most important mean of communication and plays a central role in our everyday life. Brain damage (e.g. stroke) can lead to acquired disorders of lan- guage affecting the four linguistic modalities (i.e. reading, writing, speech production and comprehension) in different combinations and levels of severity. Every year, more than 5000 people (Aphasie Suisse) are affected by aphasia in Switzerland alone. Since aphasia is highly individual, the level of difficulty and the content of tasks have to be adapted continuously by the speech therapists. Computer-based assignments allow patients to train independently at home and thus increasing the frequency of ther- apy. Recent developments in tablet computers have opened new opportunities to use these devices for rehabilitation purposes. Especially older people, who have no prior experience with computers, can benefit from the new technologies. Methods The aim of this project was to develop an application that enables patients to train language related tasks autonomously and, on the other hand, allows speech therapists to assign exercises to the patients and to track their results online. Seven categories with various types of assignments were implemented. The application has two parts which are separated by a user management system into a patient interface and a therapist interface. Both interfaces were evaluated using the SUS (Subject Usability Scale). The patient interface was tested by 15 healthy controls and 5 patients. For the patients, we also collected tracking data for further analysis. The therapist interface was evaluated by 5 speech therapists. Results The SUS score are xpatients = 98 and xhealthy = 92.7 (median = 95, SD = 7, 95% CI [88.8, 96.6]) in case of the patient interface and xtherapists = 68 in case of the therapist interface. Conclusion Both, the patients and the healthy subjects, attested high SUS scores to the patient interface. These scores are considered as "best imaginable". The therapist interface got a lower SUS score compared to the patient interface, but is still considered as "good" and "usable". The user tracking system and the interviews revealed that there is room for improvements and inspired new ideas for future versions.
Resumo:
The relationship was explored between a subjective measure of hearing status, derived from a functional self-assessment expressed in terms of ability to hear and understand spoken words, and a comparable objective measure of hearing status, obtained from a speech reception test. The Augmentation Survey of the Health and Nutrition Examination Survey of the National Center for Health Statistics provided the necessary data for a sample of 3059 adults. Using chi-square tests for the subsample with the highest level of objectively assessed hearing status, favorable subjective assessments were found to be significantly associated with higher income, lower age group, higher level of educational attainment, greater psychological adjustment, fewer symptoms of depression, and higher self-ratings of overall health. In a linear regression with self-assessment of hearing status as the dependent variable, less than one-quarter of the variation could be explained by objective status and the six explanatory variables.^
Resumo:
This paper describes the development of an Advanced Speech Communication System for Deaf People and its field evaluation in a real application domain: the renewal of Driver’s License. The system is composed of two modules. The first one is a Spanish into Spanish Sign Language (LSE: Lengua de Signos Española) translation module made up of a speech recognizer, a natural language translator (for converting a word sequence into a sequence of signs), and a 3D avatar animation module (for playing back the signs). The second module is a Spoken Spanish generator from sign-writing composed of a visual interface (for specifying a sequence of signs), a language translator (for generating the sequence of words in Spanish), and finally, a text to speech converter. For language translation, the system integrates three technologies: an example-based strategy, a rule-based translation method and a statistical translator. This paper also includes a detailed description of the evaluation carried out in the Local Traffic Office in the city of Toledo (Spain) involving real government employees and deaf people. This evaluation includes objective measurements from the system and subjective information from questionnaires. Finally, the paper reports an analysis of the main problems and a discussion about possible solutions.
Resumo:
Two new features have been proposed and used in the Rich Transcription Evaluation 2009 by the Universidad Politécnica de Madrid, which outperform the results of the baseline system. One of the features is the intensity channel contribution, a feature related to the location of the speaker. The second feature is the logarithm of the interpolated fundamental frequency. It is the first time that both features are applied to the clustering stage of multiple distant microphone meetings diarization. It is shown that the inclusion of both features improves the baseline results by 15.36% and 16.71% relative to the development set and the RT 09 set, respectively. If we consider speaker errors only, the relative improvement is 23% and 32.83% on the development set and the RT09 set, respectively.
Resumo:
Several issues concerning the current use of speech interfaces are discussed and the design and development of a speech interface that enables air traffic controllers to command and control their terminals by voice is presented. A special emphasis is made in the comparison between laboratory experiments and field experiments in which a set of ergonomics-related effects are detected that cannot be observed in the controlled laboratory experiments. The paper presents both objective and subjective performance obtained in field evaluation of the system with student controllers at an air traffic control (ATC) training facility. The system exhibits high word recognition test rates (0.4% error in Spanish and 1.5% in English) and low command error (6% error in Spanish and 10.6% error in English in the field tests). Subjective impression has also been positive, encouraging future development and integration phases in the Spanish ATC terminals designed by Aeropuertos Españoles y Navegación Aérea (AENA).
Resumo:
In this paper, we describe a complete development platform that features different innovative acceleration strategies, not included in any other current platform, that simplify and speed up the definition of the different elements required to design a spoken dialog service. The proposed accelerations are mainly based on using the information from the backend database schema and contents, as well as cumulative information produced throughout the different steps in the design. Thanks to these accelerations, the interaction between the designer and the platform is improved, and in most cases the design is reduced to simple confirmations of the “proposals” that the platform dynamically provides at each step. In addition, the platform provides several other accelerations such as configurable templates that can be used to define the different tasks in the service or the dialogs to obtain or show information to the user, automatic proposals for the best way to request slot contents from the user (i.e. using mixed-initiative forms or directed forms), an assistant that offers the set of more probable actions required to complete the definition of the different tasks in the application, or another assistant for solving specific modality details such as confirmations of user answers or how to present them the lists of retrieved results after querying the backend database. Additionally, the platform also allows the creation of speech grammars and prompts, database access functions, and the possibility of using mixed initiative and over-answering dialogs. In the paper we also describe in detail each assistant in the platform, emphasizing the different kind of methodologies followed to facilitate the design process at each one. Finally, we describe the results obtained in both a subjective and an objective evaluation with different designers that confirm the viability, usefulness, and functionality of the proposed accelerations. Thanks to the accelerations, the design time is reduced in more than 56% and the number of keystrokes by 84%.
Resumo:
This paper proposes the use of Factored Translation Models (FTMs) for improving a Speech into Sign Language Translation System. These FTMs allow incorporating syntactic-semantic information during the translation process. This new information permits to reduce significantly the translation error rate. This paper also analyses different alternatives for dealing with the non-relevant words. The speech into sign language translation system has been developed and evaluated in a specific application domain: the renewal of Identity Documents and Driver’s License. The translation system uses a phrase-based translation system (Moses). The evaluation results reveal that the BLEU (BiLingual Evaluation Understudy) has improved from 69.1% to 73.9% and the mSER (multiple references Sign Error Rate) has been reduced from 30.6% to 24.8%.
Resumo:
This paper describes a categorization module for improving the performance of a Spanish into Spanish Sign Language (LSE) translation system. This categorization module replaces Spanish words with associated tags. When implementing this module, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not relevant in the translation process. The categorization module has been incorporated into a phrase-based system and a Statistical Finite State Transducer (SFST). The evaluation results reveal that the BLEU has increased from 69.11% to 78.79% for the phrase-based system and from 69.84% to 75.59% for the SFST.
Resumo:
This paper describes the design, development and field evaluation of a machine translation system from Spanish to Spanish Sign Language (LSE: Lengua de Signos Española). The developed system focuses on helping Deaf people when they want to renew their Driver’s License. The system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). For the natural language translator, three technological approaches have been implemented and evaluated: an example-based strategy, a rule-based translation method and a statistical translator. For the final version, the implemented language translator combines all the alternatives into a hierarchical structure. This paper includes a detailed description of the field evaluation. This evaluation was carried out in the Local Traffic Office in Toledo involving real government employees and Deaf people. The evaluation includes objective measurements from the system and subjective information from questionnaires. The paper details the main problems found and a discussion on how to solve them (some of them specific for LSE).