911 resultados para speech databases
Resumo:
This paper discusses the implementation details of a child friendly, good quality, English text-to-speech (TTS) system that is phoneme-based, concatenative, easy to set up and use with little memory. Direct waveform concatenation and linear prediction coding (LPC) are used. Most existing TTS systems are unit-selection based, which use standard speech databases available in neutral adult voices.Here reduced memory is achieved by the concatenation of phonemes and by replacing phonetic wave files with their LPC coefficients. Linguistic analysis was used to reduce the algorithmic complexity instead of signal processing techniques. Sufficient degree of customization and generalization catering to the needs of the child user had been included through the provision for vocabulary and voice selection to suit the requisites of the child. Prosody had also been incorporated. This inexpensive TTS systemwas implemented inMATLAB, with the synthesis presented by means of a graphical user interface (GUI), thus making it child friendly. This can be used not only as an interesting language learning aid for the normal child but it also serves as a speech aid to the vocally disabled child. The quality of the synthesized speech was evaluated using the mean opinion score (MOS).
Resumo:
The granulomatous lesions are frequently founded in infectious diseases and can involve the larynx and pharynx and can cause varying degrees of dysphonia and dysphagia. There is still no systematic review that analyzes effectiveness of speech therapy in systemic granulomatous diseases. Research strategy: A systematic review was performed according to Cochrane guideline considering the inclusion of RCTs and quasi-RCTs about the effectiveness of speech-language therapy to treat dysphagia and dysphonia symptoms in systemic granulomatous diseases of the larynx and pharynx. Selection criteria: The outcome planned to be measured in this review were: swallowing impairment, frequency of chest infections and voice and swallowing symptoms. Data analysis: We identified 1,140 citations from all electronic databases. After an initial shift we only selected 9 titles to be retrieved in full-text. After full reading, there was no RCT found in this review and therefore, we only described the existing 2 case series studies. Results: There were no randomized controlled trials found in the literature. Therefore, two studies were selected to be included only for narratively analysis as they were case series. Conclusion: There is no evidence from high quality studies about the effectiveness of speech-language therapy in patients with granulomatous diseases of the larynx and pharynx. The investigators could rely in the outcomes suggested in this review to design their own clinical trials.
Resumo:
The fields of Rhetoric and Communication usually assume a competent speaker who is able to speak well with conscious intent; however, what happens when intent and comprehension are intact but communicative facilities are impaired (e.g., by stroke or traumatic brain injury)? What might a focus on communicative success be able to tell us in those instances? This project considers this question in examining communication disorders through identifying and analyzing patterns of (dis) fluent speech between 10 aphasic and 10 non-aphasic adults. The analysis in this report is centered on a collection of data provided by the Aphasia Bank database. The database’s collection protocol guides aphasic and non-aphasic participants through a series of language assessments, and for my re-analysis of the database’s transcripts I consider communicative success is and how it is demonstrated during a re-telling of the Cinderella narrative. I conducted a thorough examination of a set of participant transcripts to understand the contexts in which speech errors occur, and how (dis) fluencies may follow from aphasic and non-aphasic participant’s speech patterns. An inductive mixed-methods approach, informed by grounded theory, qualitative, and linguistic analyses of the transcripts functioned as a means to balance the classification of data, providing a foundation for all sampling decisions. A close examination of the transcripts and the codes of the Aphasia Bank database suggest that while the coding is abundant and detailed, that further levels of coding and analysis may be needed to reveal underlying similarities and differences in aphasic vs. non-aphasic linguistic behavior. Through four successive levels of increasingly detailed analysis, I found that patterns of repair by aphasics and non-aphasics differed primarily in degree rather than kind. This finding may have therapeutic impact, in reassuring aphasics that they are on the right track to achieving communicative fluency.
Resumo:
We present a novel approach using both sustained vowels and connected speech, to detect obstructive sleep apnea (OSA) cases within a homogeneous group of speakers. The proposed scheme is based on state-of-the-art GMM-based classifiers, and acknowledges specifically the way in which acoustic models are trained on standard databases, as well as the complexity of the resulting models and their adaptation to specific data. Our experimental database contains a suitable number of utterances and sustained speech from healthy (i.e control) and OSA Spanish speakers. Finally, a 25.1% relative reduction in classification error is achieved when fusing continuous and sustained speech classifiers. Index Terms: obstructive sleep apnea (OSA), gaussian mixture models (GMMs), background model (BM), classifier fusion.
Resumo:
In order to obtain more human like sounding humanmachine interfaces we must first be able to give them expressive capabilities in the way of emotional and stylistic features so as to closely adequate them to the intended task. If we want to replicate those features it is not enough to merely replicate the prosodic information of fundamental frequency and speaking rhythm. The proposed additional layer is the modification of the glottal model, for which we make use of the GlottHMM parameters. This paper analyzes the viability of such an approach by verifying that the expressive nuances are captured by the aforementioned features, obtaining 95% recognition rates on styled speaking and 82% on emotional speech. Then we evaluate the effect of speaker bias and recording environment on the source modeling in order to quantify possible problems when analyzing multi-speaker databases. Finally we propose a speaking styles separation for Spanish based on prosodic features and check its perceptual significance.
Resumo:
TEMA: o hipotireoidismo congênito é uma alteração metabólica que traz conseqüência graves para indivíduos não tratados e mesmo as crianças que realizam o tratamento podem apresentar distúrbios do desenvolvimento. O Programa Nacional de Triagem Neonatal, instituído pelo Ministério da Saúde, prevê o acompanhamento longitudinal de indivíduos com equipe multidisciplinar. Entretanto, a Fonoaudiologia não é incluída nesta equipe. Deste modo, considerando a ocorrência de distúrbios da comunicação nestes indivíduos, realizou-se levantamento bibliográfico nas bases de dados Lilacs, MedLine e PubMed, no período de 1987 a 2007, referente às alterações em habilidades do desenvolvimento decorrentes do hipotireoidismo congênito. OBJETIVO: verificar, na literatura científica, presença de alterações do desenvolvimento em indivíduos com hipotireoidismo congênito e refletir sobre a importância da atuação fonoaudiológica, em conjunto com equipe multidisciplinar especializada, no acompanhamento dos mesmos. CONCLUSÃO: a literatura relata alterações nas habilidades do desenvolvimento (motoras, cognitivas, lingüísticas e de autocuidados) e destaca que crianças com hipotireoidismo congênito são de risco para alterações no desenvolvimento lingüístico e, portanto, necessitam do acompanhamento longitudinal do desenvolvimento comunicativo. Torna-se evidente a importância da atuação do fonoaudiólogo nos Programas de Triagem Neonatal credenciados pelo Ministério da Saúde. Ressalta-se ainda a necessidade de investigações referentes às outras alterações metabólicas contempladas nestes programas, nas quais o fonoaudiólogo pode atuar de modo a prevenir, habilitar e reabilitar os distúrbios da comunicação, contribuindo para o trabalho em equipe, promovendo saúde nesta população.
Resumo:
The goal of this paper is to study and propose a new technique for noise reduction used during the reconstruction of speech signals, particularly for biomedical applications. The proposed method is based on Kalman filtering in the time domain combined with spectral subtraction. Comparison with discrete Kalman filter in the frequency domain shows better performance of the proposed technique. The performance is evaluated by using the segmental signal-to-noise ratio and the Itakura-Saito`s distance. Results have shown that Kalman`s filter in time combined with spectral subtraction is more robust and efficient, improving the Itakura-Saito`s distance by up to four times. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
The canonical representation of speech constitutes a perfect reconstruction (PR) analysis-synthesis system. Its parameters are the autoregressive (AR) model coefficients, the pitch period and the voiced and unvoiced components of the excitation represented as transform coefficients. Each set of parameters may be operated on independently. A time-frequency unvoiced excitation (TFUNEX) model is proposed that has high time resolution and selective frequency resolution. Improved time-frequency fit is obtained by using for antialiasing cancellation the clustering of pitch-synchronous transform tracks defined in the modulation transform domain. The TFUNEX model delivers high-quality speech while compressing the unvoiced excitation representation about 13 times over its raw transform coefficient representation for wideband speech.
Resumo:
The primary objective of this study was to assess the lingual kinematic strategies used by younger and older adults to increase rate of speech. It was hypothesised that the strategies used by the older adults would differ from the young adults either as a direct result of, or in response to a need to compensate for, age-related changes in the tongue. Electromagnetic articulography was used to examine the tongue movements of eight young (M526.7 years) and eight older (M567.1 years) females during repetitions of /ta/ and /ka/ at a controlled moderate rate and then as fast as possible. The younger and older adults were found to significantly reduce consonant durations and increase syllable repetition rate by similar proportions. To achieve these reduced durations both groups appeared to use the same strategy, that of reducing the distances travelled by the tongue. Further comparisons at each rate, however, suggested a speed-accuracy trade-off and increased speech monitoring in the older adults. The results may assist in differentiating articulatory changes associated with normal aging from pathological changes found in disorders that affect the older population.
Resumo:
Spectral peak resolution was investigated in normal hearing (NH), hearing impaired (HI), and cochlear implant (CI) listeners. The task involved discriminating between two rippled noise stimuli in which the frequency positions of the log-spaced peaks and valleys were interchanged. The ripple spacing was varied adaptively from 0.13 to 11.31 ripples/octave, and the minimum ripple spacing at which a reversal in peak and trough positions could be detected was determined as the spectral peak resolution threshold for each listener. Spectral peak resolution was best, on average, in NH listeners, poorest in CI listeners, and intermediate for HI listeners. There was a significant relationship between spectral peak resolution and both vowel and consonant recognition in quiet across the three listener groups. The results indicate that the degree of spectral peak resolution required for accurate vowel and consonant recognition in quiet backgrounds is around 4 ripples/octave, and that spectral peak resolution poorer than around 1–2 ripples/octave may result in highly degraded speech recognition. These results suggest that efforts to improve spectral peak resolution for HI and CI users may lead to improved speech recognition
Resumo:
The purpose of this study was to explore the potential advantages, both theoretical and applied, of preserving low-frequency acoustic hearing in cochlear implant patients. Several hypotheses are presented that predict that residual low-frequency acoustic hearing along with electric stimulation for high frequencies will provide an advantage over traditional long-electrode cochlear implants for the recognition of speech in competing backgrounds. A simulation experiment in normal-hearing subjects demonstrated a clear advantage for preserving low-frequency residual acoustic hearing for speech recognition in a background of other talkers, but not in steady noise. Three subjects with an implanted "short-electrode" cochlear implant and preserved low-frequency acoustic hearing were also tested on speech recognition in the same competing backgrounds and compared to a larger group of traditional cochlear implant users. Each of the three short-electrode subjects performed better than any of the traditional long-electrode implant subjects for speech recognition in a background of other talkers, but not in steady noise, in general agreement with the simulation studies. When compared to a subgroup of traditional implant users matched according to speech recognition ability in quiet, the short-electrode patients showed a 9-dB advantage in the multitalker background. These experiments provide strong preliminary support for retaining residual low-frequency acoustic hearing in cochlear implant patients. The results are consistent with the idea that better perception of voice pitch, which can aid in separating voices in a background of other talkers, was responsible for this advantage.
Resumo:
The purpose of the present study was to examine the benefits of providing audible speech to listeners with sensorineural hearing loss when the speech is presented in a background noise. Previous studies have shown that when listeners have a severe hearing loss in the higher frequencies, providing audible speech (in a quiet background) to these higher frequencies usually results in no improvement in speech recognition. In the present experiments, speech was presented in a background of multitalker babble to listeners with various severities of hearing loss. The signal was low-pass filtered at numerous cutoff frequencies and speech recognition was measured as additional high-frequency speech information was provided to the hearing-impaired listeners. It was found in all cases, regardless of hearing loss or frequency range, that providing audible speech resulted in an increase in recognition score. The change in recognition as the cutoff frequency was increased, along with the amount of audible speech information in each condition (articulation index), was used to calculate the "efficiency" of providing audible speech. Efficiencies were positive for all degrees of hearing loss. However, the gains in recognition were small, and the maximum score obtained by an listener was low, due to the noise background. An analysis of error patterns showed that due to the limited speech audibility in a noise background, even severely impaired listeners used additional speech audibility in the high frequencies to improve their perception of the "easier" features of speech including voicing
Impact of Commercial Search Engines and International Databases on Engineering Teaching and Research
Resumo:
For the last three decades, the engineering higher education and professional environments have been completely transformed by the "electronic/digital information revolution" that has included the introduction of personal computer, the development of email and world wide web, and broadband Internet connections at home. Herein the writer compares the performances of several digital tools with traditional library resources. While new specialised search engines and open access digital repositories may fill a gap between conventional search engines and traditional references, these should be not be confused with real libraries and international scientific databases that encompass textbooks and peer-reviewed scholarly works. An absence of listing in some Internet search listings, databases and repositories is not an indication of standing. Researchers, engineers and academics should remember these key differences in assessing the quality of bibliographic "research" based solely upon Internet searches.
Resumo:
The purpose of this paper is to provide a cross-linguistic survey of the variation of coding strategies that are available for the grammatical distinction between direct and indirect speech representation with a particular focus on the expression of indirect reported speech. Cross-linguistic data from a sample of 42 languages will be provided to illustrate the range of available grammatical coding strategies.