871 resultados para Speech enhancement
Resumo:
This paper describes certain findings of intonation and intensity study of emotive speech with the minimal use of signal processing algorithms. This study was based on six basic emotions and the neutral, elicited from 1660 English utterances obtained from the speech recordings of six Indian women. The correctness of the emotional content was verified through perceptual listening tests. Marked similarity was noted among pitch contours of like-worded, positive valence emotions, though no such similarity was observed among the four negative valence emotional expressions. The intensity patterns were also studied. The results of the study were validated using arbitrary television recordings for four emotions. The findings are useful to technical researchers, social psychologists and to the common man interested in the dynamics of vocal expression of emotions
Resumo:
Speech is the primary, most prominent and convenient means of communication in audible language. Through speech, people can express their thoughts, feelings or perceptions by the articulation of words. Human speech is a complex signal which is non stationary in nature. It consists of immensely rich information about the words spoken, accent, attitude of the speaker, expression, intention, sex, emotion as well as style. The main objective of Automatic Speech Recognition (ASR) is to identify whatever people speak by means of computer algorithms. This enables people to communicate with a computer in a natural spoken language. Automatic recognition of speech by machines has been one of the most exciting, significant and challenging areas of research in the field of signal processing over the past five to six decades. Despite the developments and intensive research done in this area, the performance of ASR is still lower than that of speech recognition by humans and is yet to achieve a completely reliable performance level. The main objective of this thesis is to develop an efficient speech recognition system for recognising speaker independent isolated words in Malayalam.
Resumo:
Sketches are commonly used in the early stages of design. Our previous system allows users to sketch mechanical systems that the computer interprets. However, some parts of the mechanical system might be too hard or too complicated to express in the sketch. Adding speech recognition to create a multimodal system would move us toward our goal of creating a more natural user interface. This thesis examines the relationship between the verbal and sketch input, particularly how to segment and align the two inputs. Toward this end, subjects were recorded while they sketched and talked. These recordings were transcribed, and a set of rules to perform segmentation and alignment was created. These rules represent the knowledge that the computer needs to perform segmentation and alignment. The rules successfully interpreted the 24 data sets that they were given.
Resumo:
We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.
Resumo:
We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.
Resumo:
abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. Two types of experiments were performed: a) distinguishing visually between real and synthetic image- sequences of the same utterances, ("Turing tests") and b) gauging visual speech recognition by comparing lip-reading performance of the real and synthetic image-sequences of the same utterances ("Intelligibility tests"). Subjects that were presented randomly with either real or synthetic image-sequences could not tell the synthetic from the real sequences above chance level. The same subjects when asked to lip-read the utterances from the same image-sequences recognized speech from real image-sequences significantly better than from synthetic ones. However, performance for both, real and synthetic, were at levels suggested in the literature on lip-reading. We conclude from the two experiments that the animation of Mary 101 is adequate for providing a percept of a talking head. However, additional effort is required to improve the animation for lip-reading purposes like rehabilitation and language learning. In addition, these two tasks could be considered as explicit and implicit perceptual discrimination tasks. In the explicit task (a), each stimulus is classified directly as a synthetic or real image-sequence by detecting a possible difference between the synthetic and the real image-sequences. The implicit perceptual discrimination task (b) consists of a comparison between visual recognition of speech of real and synthetic image-sequences. Our results suggest that implicit perceptual discrimination is a more sensitive method for discrimination between synthetic and real image-sequences than explicit perceptual discrimination.
Resumo:
We enhance photographs shot in dark environments by combining a picture taken with the available light and one taken with the flash. We preserve the ambiance of the original lighting and insert the sharpness from the flash image. We use the bilateral filter to decompose the images into detail and large scale. We reconstruct the image using the large scale of the available lighting and the detail of the flash. We detect and correct flash shadows. This combines the advantages of available illumination and flash photography.
Resumo:
--
Resumo:
We propose and estimate a financial distress model that explicitly accounts for the interactions or spill-over effects between financial institutions, through the use of a spatial continuity matrix that is build from financial network data of inter bank transactions. Such setup of the financial distress model allows for the empirical validation of the importance of network externalities in determining financial distress, in addition to institution specific and macroeconomic covariates. The relevance of such specification is that it incorporates simultaneously micro-prudential factors (Basel 2) as well as macro-prudential and systemic factors (Basel 3) as determinants of financial distress. Results indicate network externalities are an important determinant of financial health of a financial institutions. The parameter that measures the effect of network externalities is both economically and statistical significant and its inclusion as a risk factor reduces the importance of the firm specific variables such as the size or degree of leverage of the financial institution. In addition we analyze the policy implications of the network factor model for capital requirements and deposit insurance pricing.
Resumo:
Resumen tomado de la revista
Resumo:
Resumen en español
Resumo:
Resumen basado en el de la publicación. Resumen en español
Resumo:
Este recurso describe cómo una proporción significativa de niños en edad escolar tienen dificultades en el habla y cómo éstas afectan negativamente en su aprendizaje, tanto en entornos especializados como en generales. En él se esbozan las principales áreas de dificultad para los alumnos, y sugiere cómo los profesores pueden hacer que el programa sea más accesible para facilitar el aprendizaje. Se tratan el lenguaje expresivo, el lenguaje receptivo, el uso social del lenguaje y dificultades en el desarrollo de la coordinación, así como temas específicos como el plan de estudios de inglés, matemáticas y ciencias. A lo largo de la publicación hay información e ideas para apoyar a estos alumnos, y una amplia selección de sugerencias de buenas prácticas. Se incluye un programa de habilidades motoras, rimas para la producción del habla, trabajo de memoria, y páginas fotocopiables de diccionario.
Resumo:
Este manual presenta las ideas actuales sobre la relación entre lo hablado y lo escrito y sus dificultades lingüísticas. Proporciona perspectivas clínicas y educativas sobre la evaluación y gestión de lectura y ortografía de los niños. Comienza con una introducción teórica y sigue con la vinculación entre teoría y práctica. Está dirigido a los profesionales en los campos de educación, intervención y terapia del lenguaje, y psicología.