993 resultados para speech quality


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objectives. Adductor spasmodic dysphonia (ADSD) is a focal laryngeal dystonia, which compromises greatly the quality of life of the patients involved. It is a severe vocal disorder characterized by spasms of laryngeal muscles during speech, producing phonatory breaks, forced, strained and strangled voice. Its symptoms result from involuntary and intermittent contractions of thyroarytenoid muscle during speech, which causes vocal fold to strain, pressing each vocal fold against the other and increasing glottic resistance. Botulinum toxin injection remains the gold-standard treatment. However, as injections should be repeated periodically leading to voice quality instability, a more definitive procedure would be desirable. In this pilot study we report the long-term vocal quality results of endoscopic laser thyroarytenoid myoneurectomy. Study Design. Prospective study. Methods. Surgery was performed in 15 patients (11 females and four males), aged between 29 and 73 years, diagnosed with ADSD. Voice Handicap Index (VHI) was obtained before and after surgery (median 31 months postoperatively). Results. A significant improvement in VHI was observed after surgery, as compared with baseline values (P = 0.001). The median and interquartile range for preoperative VHI was 99 and 13, respectively and 24 and 42, for postoperative VHI. Subjective improvement of voice as assessed by the patients showed median improvement of 80%. Conclusions. Because long-term follow-up showed significant improvement of voice quality, this innovative surgical technique seems a satisfactory alternative treatment of ADSD patients who seek a definite improvement of their condition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction: In recent years, the benefits associated with the use of cochlear implants (CIs), especially with regard to speech perception, have proven to surpass those produced by the use of hearing aids, making CIs a highly efficient resource for patients with severe/profound hearing loss. However, few studies so far have assessed the satisfaction of adult users of CIs. Objective: To analyze the relationship between the level of speech perception and degree of satisfaction of adult users of CI. Method: This was a prospective cross-sectional study conducted in the Audiological Research Center (CPA) of the Hospital of Craniofacial Anomalies, University of São Paulo (HRAC/USP), in Bauru, São Paulo, Brazil. A total of 12 users of CIs with pre-lingual or post-lingual hearing loss participated in this study. The following tools were used in the assessment: a questionnaire, "Satisfaction with Amplification in Daily Life" (SADL), culturally adapted to Brazilian Portuguese, as well as its relationship with the speech perception results; a speech perception test under quiet conditions; and the Hearing in Noise Test (HINT)Brazil under free field conditions. Results: The participants in the study were on the whole satisfied with their devices, and the degree of satisfaction correlated positively with the ability to perceive monosyllabic words under quiet conditions. The satisfaction did not correlate with the level of speech perception in noisy environments. Conclusion: Assessments of satisfaction may help professionals to predict what other factors, in addition to speech perception, may contribute to the satisfaction of CI users in order to reorganize the intervention process to improve the users' quality of life.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010. Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals. The goal of the procedure is to perceptually improve the sound of speech signals in background noise. The proposed new method modifies Xiao's method in four significant ways. Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end. Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique. With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training. Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation. And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz. The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios. The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures. Results of subjective CMOS tests over a smaller set of test samples support our claims.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Telephone communication is a challenge for many hearing-impaired individuals. One important technical reason for this difficulty is the restricted frequency range (0.3-3.4 kHz) of conventional landline telephones. Internet telephony (voice over Internet protocol [VoIP]) is transmitted with a larger frequency range (0.1-8 kHz) and therefore includes more frequencies relevant to speech perception. According to a recently published, laboratory-based study, the theoretical advantage of ideal VoIP conditions over conventional telephone quality has translated into improved speech perception by hearing-impaired individuals. However, the speech perception benefits of nonideal VoIP network conditions, which may occur in daily life, have not been explored. VoIP use cannot be recommended to hearing-impaired individuals before its potential under more realistic conditions has been examined.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nadeina set out to develop methods of speech development in Russian as a mother tongue, focusing on improving diction, training in voice quality control, intonation control, the removal of dialect, and speech etiquette. She began with training in the receptive skills of language, i.e. reading and listening, since the interpretation of someone else's language plays an important role in language production. Her studies of students' reading speed of students showed that it varies between 40 and 120 words per minute, which is normally considered very slow. She discovered a strong correlation between speed of reading and speaking skills: the slower a person reads the worse is their ability to speak and has designed exercises to improve reading skills. Nadeina also believes that listening to other people's speech is very important, both to analyse its content and in some cases as an example, so listening skills need to be developed. Many people have poor pronunciation habits acquired as children. On the basis of speech samples from young Russians (male and female, aged 17-22), Nadeina analysed the commonest speech faults - nasalisation, hesitation and hemming at the end of sense-groups, etc. Using a group of twenty listeners, she looked for a correlation between how voice quality is perceived and certain voice quality parameters, e.g. pitch range, tremulousness, fluency, whispering, harshness, sonority, tension and audible breath. She found that the less non-linguistic segment variations in speech appeared, the more attractive the speech was rated. The results are included in a textbook aimed at helping people to improve their oral skills and to communicate ideas to an audience. She believes this will assist Russian officials in their attempts to communicate their ideas to different social spheres, and also foreigners learning Russian.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: The surgical treatment of oral cancer results in functional and aesthetical impairments. Patients' quality of life is considerably impaired by oral symptoms resulting from therapy of oral cancer. In many cases the inevitable resection of the tumor, as well as the adjuvant radiochemotherapy will cause the destruction of physiologically and anatomically important structures. One focus of research was the specific rehabilitation of dental loss by functional dentures. Another was the course of 19 impairments (comprehension of speech for unknown others, comprehension of speech for familiar others, eating/swallowing, mobility of the tongue, opening range of the mouth, mobility of lower jaw, mobility of neck, mobility of arms and shoulders, sense of taste, sense of smell, appearance, strength, appetite, respiration, pain, swelling, xerostomia, halitosis). METHODS: Commissioned by the German, Austrian and Swiss cooperative group on tumors of the maxillofacial region (DOSAK), data were collected in 3.894 questionnaires at 43 hospitals in Germany, Austria and Switzerland. The catalogue comprised 147 items in 9 chapters. At the end of the enquiry, 1.761 anonymous questionnaires were returned by 38 hospitals. 1.652 of these could be evaluated regarding the question. RESULTS: The sum score of the 19 impairments was highly increased immediately after the operation and recovered over the next 6 months, without, however, reaching the pre-surgery level. Of 1.652 patients, only 35% did not lose any teeth during therapy. 23% lost up to 5, 17% up to 10 teeth. A quarter of the patients lost more than 10 teeth. The more teeth were lost, the greater the decline of quality of life (p < or = 0.001), although this could be allayed by the functionality of the dentures (p < or = 0.001). There is a reciprocal dependence between the functionality of dental prosthetics and impairment by eating/swallowing (p < or = 0.001). CONCLUSIONS: Patients' quality of life after radical surgery of a carcinoma of the oral cavity depends not only on the functionality of dentures and the specificity of rehabilitation, but also from the initial findings, the extent and location of the resection, the chosen therapy, the general circumstances of the patient's life as well as their strategies of coping. These factors, however, unlike those of functionality of dental prosthesis and rehabilitation, are not modifiable.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

CONCLUSIONS: Speech understanding is better with the Baha Divino than with the Baha Compact in competing noise from the rear. No difference was found for speech understanding in quiet. Subjectively, overall sound quality and speech understanding were rated better for the Baha Divino. OBJECTIVES: To compare speech understanding in quiet and in noise and subjective ratings for two different bone-anchored hearing aids: the recently developed Baha Divino and the Baha Compact. PATIENTS AND METHODS: Seven adults with bilateral conductive or mixed hearing losses who were users of a bone-anchored hearing aid were tested with the Baha Compact in quiet and in noise. Tests were repeated after 3 months of use with the Baha Divino. RESULTS: There was no significant difference between the two types of Baha for speech understanding in quiet when tested with German numbers and monosyllabic words at presentation levels between 50 and 80 dB. For speech understanding in noise, an advantage of 2.3 dB for the Baha Divino vs the Baha Compact was found, if noise was emitted from a loudspeaker to the rear of the listener and the directional microphone noise reduction system was activated. Subjectively, the Baha Divino was rated statistically significantly better in terms of overall sound quality.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Speech Technologies can provide important benefits for the development of more usable and safe in-vehicle human-machine interactive systems (HMIs). However mainly due robustness issues, the use of spoken interaction can entail important distractions to the driver. In this challenging scenario, while speech technologies are evolving, further research is necessary to explore how they can be complemented with both other modalities (multimodality) and information from the increasing number of available sensors (context-awareness). The perceived quality of speech technologies can significantly be increased by implementing such policies, which simply try to make the best use of all the available resources; and the in vehicle scenario is an excellent test-bed for this kind of initiatives. In this contribution we propose an event-based HMI design framework which combines context modelling and multimodal interaction using a W3C XML language known as SCXML. SCXML provides a general process control mechanism that is being considered by W3C to improve both voice interaction (VoiceXML) and multimodal interaction (MMI). In our approach we try to anticipate and extend these initiatives presenting a flexible SCXML-based approach for the design of a wide range of multimodal context-aware HMI in-vehicle interfaces. The proposed framework for HMI design and specification has been implemented in an automotive OSGi service platform, and it is being used and tested in the Spanish research project MARTA for the development of several in-vehicle interactive applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the biggest challenges in speech synthesis is the production of naturally sounding synthetic voices. This means that the resulting voice must be not only of high enough quality but also that it must be able to capture the natural expressiveness imbued in human speech. This paper focus on solving the expressiveness problem by proposing a set of different techniques that could be used for extrapolating the expressiveness of proven high quality speaking style models into neutral speakers in HMM-based synthesis. As an additional advantage, the proposed techniques are based on adaptation approaches, which means that they can be used with little training data (around 15 minutes of training data are used in each style for this paper). For the final implementation, a set of 4 speaking styles were considered: news broadcasts, live sports commentary, interviews and parliamentary speech. Finally, the implementation of the 5 techniques were tested through a perceptual evaluation that proves that the deviations between neutral and speaking style average models can be learned and used to imbue expressiveness into target neutral speakers as intended.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BioMet®Phon is a software application developed for the characterization of voice in voice quality evaluation. Initially it was conceived as plain research code to estimate the glottal source from voice and obtain the biomechanical parameters of the vocal folds from the spectral density of the estimate. This code grew to what is now the Glottex®Engine package (G®E). Further demands from users in laryngology and speech therapy fields instantiated the development of a specific Graphic User Interface (GUI’s) to encapsulate user interaction with the G®E. This gave place to BioMet®Phon, an application which extracts the glottal source from voice and offers a complete parameterization of this signal, including distortion, cepstral, spectral, biomechanical, time domain, contact and tremor parameters. The semantic capabilities of biomechanical parameters are discussed. Study cases from its application to the field of laryngology and speech therapy are given and discussed. Validation results in voice pathology detection are also presented. Applications to laryngology, speech therapy, and monitoring neurological deterioration in the elder are proposed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and foreground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of the study was firstly to document the acoustic parameters of voice using the Multidimensional Voice Program (MDVP, Kay Elemetrics) in a group of children with dysarthria subsequent to treatment for cerebellar tumour (CT). Then, secondly, compare the acoustic findings to perceptual voice characteristics as described by the GIRBAS (grade, instability, roughness, breathiness, asthenicity, strain). The assessments were performed on 29 voice samples; 9 cerebellar tumour participants with dysarthria, and 20 control participants. None of the control voices were rated as exhibiting any of the six parameters described by the GIRBAS, while 7 of the CT participants were noted to have at least a mild voice disorder. Roughness, instability, breathiness and asthenicity were all identified as voice characteristics in the CT voice samples. Acoustically, the CT voice samples differed significantly from the controls' voices on frequency and amplitude perturbation measures. Our findings confirmed voice dysfunction as a component of dysarthria in children treated for cerebellar tumour, and discussed the links between acoustic and perceptual descriptions. Copyright (C) 2004 S. Karger AG, Basel.