904 resultados para Audio-Visual Automatic Speech Recognition
Resumo:
Computer speech synthesis has reached a high level of performance, with increasingly sophisticated models of linguistic structure, low error rates in text analysis, and high intelligibility in synthesis from phonemic input. Mass market applications are beginning to appear. However, the results are still not good enough for the ubiquitous application that such technology will eventually have. A number of alternative directions of current research aim at the ultimate goal of fully natural synthetic speech. One especially promising trend is the systematic optimization of large synthesis systems with respect to formal criteria of evaluation. Speech recognition has progressed rapidly in the past decade through such approaches, and it seems likely that their application in synthesis will produce similar improvements.
Resumo:
The integration of speech recognition with natural language understanding raises issues of how to adapt natural language processing to the characteristics of spoken language; how to cope with errorful recognition output, including the use of natural language information to reduce recognition errors; and how to use information from the speech signal, beyond just the sequence of words, as an aid to understanding. This paper reviews current research addressing these questions in the Spoken Language Program sponsored by the Advanced Research Projects Agency (ARPA). I begin by reviewing some of the ways that spontaneous spoken language differs from standard written language and discuss methods of coping with the difficulties of spontaneous speech. I then look at how systems cope with errors in speech recognition and at attempts to use natural language information to reduce recognition errors. Finally, I discuss how prosodic information in the speech signal might be used to improve understanding.
Resumo:
Research in speech recognition and synthesis over the past several decades has brought speech technology to a point where it is being used in "real-world" applications. However, despite the progress, the perception remains that the current technology is not flexible enough to allow easy voice communication with machines. The focus of speech research is now on producing systems that are accurate and robust but that do not impose unnecessary constraints on the user. This chapter takes a critical look at the shortcomings of the current speech recognition and synthesis algorithms, discusses the technical challenges facing research, and examines the new directions that research in speech recognition and synthesis must take in order to form the basis of new solutions suitable for supporting a wide range of applications.
Resumo:
Las teorías cognitivas han demostrado que el pensamiento humano se encuentra corporeizado; es decir, que accedemos a la realidad mediante nuestros sentidos y no podemos huir de ellos. Para entender y manejar conceptos abstractos utilizamos proyecciones metafóricas basadas en sensaciones corporales. De ahí la ubicuidad de la metáfora en el lenguaje cotidiano. Aunque esta afirmación ha sido ampliamente probada con el análisis del corpus verbal en distintas lenguas, apenas existen investigaciones en el corpus audiovisual. Si las metáforas primarias forman parte de nuestro inconsciente cognitivo, son inherentes al ser humano y consecuencia de la naturaleza del cerebro, deben generar también metáforas visuales. En este artículo, se analizan y discuten una serie de ejemplos para comprobarlo.
Resumo:
Bibliography: p. 41.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
This paper presents a corpus-based descriptive analysis of the most prevalent transfer effects and connected speech processes observed in a comparison of 11 Vietnamese English speakers (6 females, 5 males) and 12 Australian English speakers (6 males, 6 females) over 24 grammatical paraphrase items. The phonetic processes are segmentally labelled in terms of IPA diacritic features using the EMU speech database system with the aim of labelling departures from native-speaker pronunciation. An analysis of prosodic features was made using ToBI framework. The results show many phonetic and prosodic processes which make non-native speakers’ speech distinct from native ones. The corpusbased methodology of analysing foreign accent may have implications for the evaluation of non-native accent, accented speech recognition and computer assisted pronunciation- learning.
Resumo:
Automatic Term Recognition (ATR) is a fundamental processing step preceding more complex tasks such as semantic search and ontology learning. From a large number of methodologies available in the literature only a few are able to handle both single and multi-word terms. In this paper we present a comparison of five such algorithms and propose a combined approach using a voting mechanism. We evaluated the six approaches using two different corpora and show how the voting algorithm performs best on one corpus (a collection of texts from Wikipedia) and less well using the Genia corpus (a standard life science corpus). This indicates that choice and design of corpus has a major impact on the evaluation of term recognition algorithms. Our experiments also showed that single-word terms can be equally important and occupy a fairly large proportion in certain domains. As a result, algorithms that ignore single-word terms may cause problems to tasks built on top of ATR. Effective ATR systems also need to take into account both the unstructured text and the structured aspects and this means information extraction techniques need to be integrated into the term recognition process.
Resumo:
This paper discusses the first of three studies which collectively represent a convergence of two ongoing research agendas: (1) the empirically-based comparison of the effects of evaluation environment on mobile usability evaluation results; and (2) the effect of environment - in this case lobster fishing boats - on achievable speech-recognition accuracy. We describe, in detail, our study and outline our results to date based on preliminary analysis. Broadly speaking, the potential for effective use of speech for data collection and vessel control looks very promising - surprisingly so! We outline our ongoing analysis and further work.
Resumo:
This paper discusses the first of three studies which collectively represent a convergence of two ongoing research agendas: (1) the empirically-based comparison of the effects of evaluation environment on mobile usability evaluation results; and (2) the effect of environment - in this case lobster fishing boats - on achievable speech-recognition accuracy. We describe, in detail, our study and outline our results to date based on preliminary analysis. Broadly speaking, the potential for effective use of speech for data collection and vessel control looks very promising - surprisingly so! We outline our ongoing analysis and further work.
Resumo:
In contrast to Muslins traditions and costumes, the US government and society seems to invest in the media to forge discourses on Western way of life. In addition, it creates idealized images of the woman, the hero, the father, the family, and an everyday speech invoking repeated and widespread moral values, including “justice” and “freedom”, in opposition to the “terror”. In this research we analysed the TV series Homeland, using as theoretical support the Cultural Studies, particularly the concept of Social Representation by Denise Jodelet, the analytics tools created by Michel Foucault on power devices, and feminist studies by Teresa of Lauretis. I’ve tried to see how forces in correlations operate, and how representations of womanhood, sexuality and nationality are built and reiterated in speeches, creating patterns of behaviour for men and women. Spreading images of the “good” man, the “good” wife, and the “hero”, the audio-visual product creates and produces the family, the society and the nation considered exemplar.
Resumo:
This article argues that sonic technologies, such as telephones, voice recorders and phonographs, alongside more (audio)visual ones such as flickering fluorescent lights, videos, and the television sets are crucial to the world of Twin Peaks, and constitute this world as both a communications network with portals to the unknown, and an accumulation of recordings of ghosted voices and entities, perhaps finding its ultimate expression in the backwards reprocessed speech in the Black Lodge. This lodge can be understood as a space in which there are nothing but recordings, albeit now on a cosmic, spiritual and demonic level. Using a media archaeological approach to these devices in the series, this paper will argue that they were already operating by a media archaeological logic, generating the world of Twin Peaks as a haunted archive of sonic and other mediations.
Resumo:
Se presenta en este texto, una introducción al Síndrome de Asperger y aquellas características que lo distinguen, con el fin de conocer un poco más, en qué consiste este Trastorno Generalizado del Desarrollo (TGD). Además, se pretende facilitar cuales son las herramientas de comunicación y lenguaje más aptas para la enseñanza y aprendizaje del sujeto, haciendo hincapié en los recursos visuales, audiovisuales y artísticos como herramientas de aprendizaje para su inclusión social en cualquier ámbito de la sociedad (colegios, institutos, asociaciones, universidades o administraciones).
Resumo:
Notre mémoire prend en charge de re-conceptualiser notre nouvel environnement audio-visuel et l’expérience que nous en faisons. À l’ère du numérique et de la dissémination généralisée des images animées, nous circonscrivons une catégorie d’images que nous concevons comme la plus à même d’avoir un impact sur le développement humain. Nous les appelons des images-sons synchrono-photo-temporalisées. Plus spécifiquement, nous cherchons à mettre en lumière leur puissance d’affection et de contrôle en démontrant qu’elles ont une influence certaine sur le processus d’individuation, influence qui est grandement facilitée par l’isotopie structurelle qui existe entre le flux de conscience et leur flux d’écoulement. Par le biais des recherches de Bernard Stiegler, nous remarquons également l’important rôle que jouent l’attention et la mémoire dans le processus d’individuation. L’ensemble de notre réflexion nous fait réaliser à quel point le système d’éducation actuel québécois manque à sa tâche de formation citoyenne en ne dispensant pas un enseignement adéquat des images animées.
Resumo:
Notre mémoire prend en charge de re-conceptualiser notre nouvel environnement audio-visuel et l’expérience que nous en faisons. À l’ère du numérique et de la dissémination généralisée des images animées, nous circonscrivons une catégorie d’images que nous concevons comme la plus à même d’avoir un impact sur le développement humain. Nous les appelons des images-sons synchrono-photo-temporalisées. Plus spécifiquement, nous cherchons à mettre en lumière leur puissance d’affection et de contrôle en démontrant qu’elles ont une influence certaine sur le processus d’individuation, influence qui est grandement facilitée par l’isotopie structurelle qui existe entre le flux de conscience et leur flux d’écoulement. Par le biais des recherches de Bernard Stiegler, nous remarquons également l’important rôle que jouent l’attention et la mémoire dans le processus d’individuation. L’ensemble de notre réflexion nous fait réaliser à quel point le système d’éducation actuel québécois manque à sa tâche de formation citoyenne en ne dispensant pas un enseignement adéquat des images animées.