314 resultados para Speech act
Resumo:
Human listeners can identify vowels regardless of speaker size, although the sound waves for an adult and a child speaking the ’same’ vowel would differ enormously. The differences are mainly due to the differences in vocal tract length (VTL) and glottal pulse rate (GPR) which are both related to body size. Automatic speech recognition machines are notoriously bad at understanding children if they have been trained on the speech of an adult. In this paper, we propose that the auditory system adapts its analysis of speech sounds, dynamically and automatically to the GPR and VTL of the speaker on a syllable-to-syllable basis. We illustrate how this rapid adaptation might be performed with the aid of a computational version of the auditory image model, and we propose that an auditory preprocessor of this form would improve the robustness of speech recognisers.
Resumo:
Peripheral nerve damage is a problem encountered after trauma and during surgery and the development of synthetic polymer conduits may offer a promising alternative to autografts. In order to improve the performance of the polymer to be used for nerve conduits, poly-ε-caprolactone (PCL) films were chemically functionalized with RGD moieties, using a chemical reaction previously developed. In vitro cultures of dissociated dorsal root ganglion (DRG) neurons provide a valid model to study different factors affecting axonal growth. In this work, DRG neurons were cultured on RGD-functionalized PCL films. Adult adipose-derived stem cells differentiated to Schwann cells (dASCs) were initially cultured on the functionalized PCL films, resulting in improved attachment and proliferation. dASCs were also co-cultured with DRG neurons on treated and untreated PCL to assess stimulation by dASCs on neurite outgrowth. Neuron response was generally poor on untreated PCL films, but long neurites were observed in the presence of dASCs or RGD moieties. A combination of the two factors enhanced even further neurite outgrowth, acting synergistically. Finally, in order to better understand the extracellular matrix (ECM)-cell interaction, a β1 integrin blocking experiment was carried out. Neurite outgrowth was not affected by the specific antibody blocking, showing that β1 integrin function can be compensated by other molecules present on the cell membrane. Copyright © 2013 John Wiley & Sons, Ltd.
Resumo:
This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems. © 2013 IEEE.
Resumo:
Large margin criteria and discriminative models are two effective improvements for HMM-based speech recognition. This paper proposed a large margin trained log linear model with kernels for CSR. To avoid explicitly computing in the high dimensional feature space and to achieve the nonlinear decision boundaries, a kernel based training and decoding framework is proposed in this work. To make the system robust to noise a kernel adaptation scheme is also presented. Previous work in this area is extended in two directions. First, most kernels for CSR focus on measuring the similarity between two observation sequences. The proposed joint kernels defined a similarity between two observation-label sequence pairs on the sentence level. Second, this paper addresses how to efficiently employ kernels in large margin training and decoding with lattices. To the best of our knowledge, this is the first attempt at using large margin kernel-based log linear models for CSR. The model is evaluated on a noise corrupted continuous digit task: AURORA 2.0. © 2013 IEEE.
Resumo:
This paper presents an overview of the Text-to-Speech synthesis system developed at the Institute for Language and Speech Processing (ILSP). It focuses on the key issues regarding the design of the system components. The system currently fully supports three languages (Greek, English, Bulgarian) and is designed in such a way to be as language and speaker independent as possible. Also, experimental results are presented which show that the system produces high quality synthetic speech in terms of naturalness and intelligibility. The system was recently ranked among the first three systems worldwide in terms of achieved quality for the English language, at the international Blizzard Challenge 2013 workshop. © 2014 Springer International Publishing.