935 resultados para Text-to-speech systems


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1 - -10, was obtained for intelligibility and naturalness respectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study investigated the effects of word prediction and text-to-speech on the narrative composition writing skills of 6, fifth-grade Hispanic boys with specific learning disabilities (SLD). A multiple baseline design across subjects was used to explore the efficacy of word prediction and text-to-speech alone and in combination on four dependent variables: writing fluency (words per minute), syntax (T-units), spelling accuracy, and overall organization (holistic scoring rubric). Data were collected and analyzed during baseline, assistive technology interventions, and at 2-, 4-, and 6-week maintenance probes. ^ Participants were equally divided into Cohorts A and B, and two separate but related studies were conducted. Throughout all phases of the study, participants wrote narrative compositions for 15-minute sessions. During baseline, participants used word processing only. During the assistive technology intervention condition, Cohort A participants used word prediction followed by word prediction with text-to-speech. Concurrently, Cohort B participants used text-to-speech followed by text-to-speech with word prediction. ^ The results of this study indicate that word prediction alone or in combination with text-to-speech has a positive effect on the narrative writing compositions of students with SLD. Overall, participants in Cohorts A and B wrote more words, more T-units, and spelled more words correctly. A sign test indicated that these perceived effects were not likely due to chance. Additionally, the quality of writing improved as measured by holistic rubric scores. When participants in Cohort B used text-to-speech alone, with the exception of spelling accuracy, inconsequential results were observed on all dependent variables. ^ This study demonstrated that word prediction alone or in combination assists students with SLD to write longer, improved-quality, narrative compositions. These results suggest that word prediction or word prediction with text-to-speech be considered as a writing support to facilitate the production of a first draft of a narrative composition. However, caution should be given to the use of text-to-speech alone as its effectiveness has not been established. Recommendations for future research include investigating the use of these technologies in other phases of the writing process, with other student populations, and with other writing styles. Further, these technologies should be investigated while integrated into classroom composition instruction. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, a linguistically rule-based grapheme-to-phone (G2P) transcription algorithm is described for European Portuguese. A complete set of phonological and phonetic transcription rules regarding the European Portuguese standard variety is presented. This algorithm was implemented and tested by using online newspaper articles. The obtained experimental results gave rise to 98.80% of accuracy rate. Future developments in order to increase this value are foreseen. Our purpose with this work is to develop a module/ tool that can improve synthetic speech naturalness in European Portuguese. Other applications of this system can be expected like language teaching/learning. These results, together with our perspectives of future improvements, have proved the dramatic importance of linguistic knowledge on the development of Text-to-Speech systems (TTS).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a module for the prediction of emotions in text chats in Spanish, oriented to its use in specific-domain text-to-speech systems. A general overview of the system is given, and the results of some evaluations carried out with two corpora of real chat messages are described. These results seem to indicate that this system offers a performance similar to other systems described in the literature, for a more complex task than other systems (identification of emotions and emotional intensity in the chat domain).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sistemas de reconhecimento e síntese de voz são constituídos por módulos que dependem da língua e, enquanto existem muitos recursos públicos para alguns idiomas (p.e. Inglês e Japonês), os recursos para Português Brasileiro (PB) ainda são escassos. Outro aspecto é que, para um grande número de tarefas, a taxa de erro dos sistemas de reconhecimento de voz atuais ainda é elevada, quando comparada à obtida por seres humanos. Assim, apesar do sucesso das cadeias escondidas de Markov (HMM), é necessária a pesquisa por novos métodos. Este trabalho tem como motivação esses dois fatos e se divide em duas partes. A primeira descreve o desenvolvimento de recursos e ferramentas livres para reconhecimento e síntese de voz em PB, consistindo de bases de dados de áudio e texto, um dicionário fonético, um conversor grafema-fone, um separador silábico e modelos acústico e de linguagem. Todos os recursos construídos encontram-se publicamente disponíveis e, junto com uma interface de programação proposta, têm sido usados para o desenvolvimento de várias novas aplicações em tempo-real, incluindo um módulo de reconhecimento de voz para a suíte de aplicativos para escritório OpenOffice.org. São apresentados testes de desempenho dos sistemas desenvolvidos. Os recursos aqui produzidos e disponibilizados facilitam a adoção da tecnologia de voz para PB por outros grupos de pesquisa, desenvolvedores e pela indústria. A segunda parte do trabalho apresenta um novo método para reavaliar (rescoring) o resultado do reconhecimento baseado em HMMs, o qual é organizado em uma estrutura de dados do tipo lattice. Mais especificamente, o sistema utiliza classificadores discriminativos que buscam diminuir a confusão entre pares de fones. Para cada um desses problemas binários, são usadas técnicas de seleção automática de parâmetros para escolher a representaçãao paramétrica mais adequada para o problema em questão.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Current text-to-speech systems are developed using studio-recorded speech in a neutral style or based on acted emotions. However, the proliferation of media sharing sites would allow developing a new generation of speech-based systems which could cope with spontaneous and styled speech. This paper proposes an architecture to deal with realistic recordings and carries out some experiments on unsupervised speaker diarization. In order to maximize the speaker purity of the clusters while keeping a high speaker coverage, the paper evaluates the F-measure of a diarization module, achieving high scores (>85%) especially when the clusters are longer than 30 seconds, even for the more spontaneous and expressive styles (such as talk shows or sports).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents parts-of-speech tagging as a first step towards an autonomous text-to-scene conversion system. It categorizes some freely available taggers, according to the techniques used by each in order to automatically identify word-classes. In addition, the performance of each identified tagger is verified experimentally. The SUSANNE corpus is used for testing and reveals the complexity of working with different tagsets, resulting in substantially lower accuracies in our tests than in those reported by the developers of each tagger. The taggers are then grouped to form a voting system to attempt to raise accuracies, but in no cases do the combined results improve upon the individual accuracies. Additionally a new metric, agreement, is tentatively proposed as an indication of confidence in the output of a group of taggers where such output cannot be validated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Concept mapping involves determining relevant concepts from a free-text input, where concepts are defined in an external reference ontology. This is an important process that underpins many applications for clinical information reporting, derivation of phenotypic descriptions, and a number of state-of-the-art medical information retrieval methods. Concept mapping can be cast into an information retrieval (IR) problem: free-text mentions are treated as queries and concepts from a reference ontology as the documents to be indexed and retrieved. This paper presents an empirical investigation applying general-purpose IR techniques for concept mapping in the medical domain. A dataset used for evaluating medical information extraction is adapted to measure the effectiveness of the considered IR approaches. Standard IR approaches used here are contrasted with the effectiveness of two established benchmark methods specifically developed for medical concept mapping. The empirical findings show that the IR approaches are comparable with one benchmark method but well below the best benchmark.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Search is now going beyond looking for factual information, and people wish to search for the opinions of others to help them in their own decision-making. Sentiment expressions or opinion expressions are used by users to express their opinion and embody important pieces of information, particularly in online commerce. The main problem that the present dissertation addresses is how to model text to find meaningful words that express a sentiment. In this context, I investigate the viability of automatically generating a sentiment lexicon for opinion retrieval and sentiment classification applications. For this research objective we propose to capture sentiment words that are derived from online users’ reviews. In this approach, we tackle a major challenge in sentiment analysis which is the detection of words that express subjective preference and domain-specific sentiment words such as jargon. To this aim we present a fully generative method that automatically learns a domain-specific lexicon and is fully independent of external sources. Sentiment lexicons can be applied in a broad set of applications, however popular recommendation algorithms have somehow been disconnected from sentiment analysis. Therefore, we present a study that explores the viability of applying sentiment analysis techniques to infer ratings in a recommendation algorithm. Furthermore, entities’ reputation is intrinsically associated with sentiment words that have a positive or negative relation with those entities. Hence, is provided a study that observes the viability of using a domain-specific lexicon to compute entities reputation. Finally, a recommendation system algorithm is improved with the use of sentiment-based ratings and entities reputation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reviews a study to determine if an auditory approach to speech correction can be of beneift to hearing impaired children who have become visually oriented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using the formalism of the Ruelle response theory, we study how the invariant measure of an Axiom A dynamical system changes as a result of adding noise, and describe how the stochastic perturbation can be used to explore the properties of the underlying deterministic dynamics. We first find the expression for the change in the expectation value of a general observable when a white noise forcing is introduced in the system, both in the additive and in the multiplicative case. We also show that the difference between the expectation value of the power spectrum of an observable in the stochastically perturbed case and of the same observable in the unperturbed case is equal to the variance of the noise times the square of the modulus of the linear susceptibility describing the frequency-dependent response of the system to perturbations with the same spatial patterns as the considered stochastic forcing. This provides a conceptual bridge between the change in the fluctuation properties of the system due to the presence of noise and the response of the unperturbed system to deterministic forcings. Using Kramers-Kronig theory, it is then possible to derive the real and imaginary part of the susceptibility and thus deduce the Green function of the system for any desired observable. We then extend our results to rather general patterns of random forcing, from the case of several white noise forcings, to noise terms with memory, up to the case of a space-time random field. Explicit formulas are provided for each relevant case analysed. As a general result, we find, using an argument of positive-definiteness, that the power spectrum of the stochastically perturbed system is larger at all frequencies than the power spectrum of the unperturbed system. We provide an example of application of our results by considering the spatially extended chaotic Lorenz 96 model. These results clarify the property of stochastic stability of SRB measures in Axiom A flows, provide tools for analysing stochastic parameterisations and related closure ansatz to be implemented in modelling studies, and introduce new ways to study the response of a system to external perturbations. Taking into account the chaotic hypothesis, we expect that our results have practical relevance for a more general class of system than those belonging to Axiom A.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and foreground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.