873 resultados para Audio-Visual Automatic Speech Recognition


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Esta investigación se centra en el estudio de la dimensión audiovisual de la arquitectura, como aproximación intersensorial a la aprehensión e ideación del espacio. Poniendo en evidencia la complejidad de la relación hombre-medio, se plantea la necesidad de desarrollar nuevas metodologías y herramientas que tengan en cuenta dicha complejidad y que favorezcan el desarrollo del proyecto. Nos mueve en esta investigación la convicción de que los cambios rápidos y profundos que caracterizan nuestros tiempos en todos los ámbitos, social, económico, político… entrañan inevita-blemente nuevos modos de conocimiento y experimentación del espacio, y por tanto nuevos ejes de investigación. La creciente valoración, en todos los campos del conocimiento, de los aspectos subjetivos y sensoriales, el desarrollo de las tecnologías que ha cambiado completamente nuestras relaciones interpersonales y con el entorno, las nuevas capacidades de análisis, grabación y conservación y manipulación de datos y por ultimo, aunque no menos importante, la puesta a disposición democrá¬tica y global de todo el saber a través de Internet, imponen otra aproximación al hacer, concebir y vivir la arquitectura. Esta investigación se centra en un análisis crítico del estado de la cuestión, construyendo nue¬vas redes de relación entre disciplinas, que permitan plantear la dimensión audiovisual como un nuevo eje de investigación dentro de la arquitectura, poniendo en evidencia la necesidad de desa¬rrollar análisis de forma trasversal e interdisciplinar. Hemos prestado particular atención a la evolución de lo sonoro y su aproximación cualitativa a la arquitectura, mostrando como el sonido, con su capacidad de introducir el tiempo y los aspectos dinámicos (el movimiento, la presencia del cuerpo…), no es simplemente otro canal sensorial en la aprehensión del espacio, ya que su interacción con lo visual genera un espacio-tiempo indisociable, propio, característico de cada momento y lugar. A partir de este planteamiento se ha hecho una revisión metodológica dirigida a utilizar el reco¬rrido como herramienta de análisis, que permita estudiar la relación entre el espacio, la acción y la percepción audio-visual, cruzando para ello los datos correspondientes a la morfología del espacio, con los datos de la experiencia perceptiva individual y con los de los usos colectivos del espacio, utilizándose finalmente el video como un herramienta, no sólo de representación de lo real, sino también como instrumento de análisis, que permite tomar datos (grabaciones audio, video, obser¬vaciones…), aislarlos, estudiarlos, clasificarlos, ordenarlos, y finalmente, restituirlos mediante el montaje. Se ha realizado una primera experimentación “in situ” que ha servido para explorar la aplicación del método, planteando nuevas preguntas y abriendo líneas de análisis para ulteriores investigacio¬nes. ABSTRACT This research is focused on the study of the audiovisual dimension of architecture, as an in¬tersensorial approach to space apprehension and design. It is posed the necessity to develop new methodologies and tools that keep this complexity, as a contribution to the development of a project, by means of putting into evidence the sophistication of the relationship between man and media The research moves us to the conviction that the quick and relevant changes that confer a distinc-tion to these contemporary times all over the social, economic and political environments, involve, unavoidably, new ways of knowledge and experimentation on space, and therefore, new trends of research. The growing valuation of subjective and sensorial aspects all over the fields of the knowledge and the development of the technologies that have changed completely our interpersonal and environmental relationships, the new tools for analysis, recording, conservation and manipulation of data and, last but not least, the setting to democratic and global availability of the whole knowledge through Inter¬net, impose another approach to the making, conception and experience of architecture. This research deals with a critical analysis of the state–of- the-art of the matter, modelling new webs of relationship among disciplines that allow to outline the audiovisual dimension as a new focus of research on architecture, putting evidence into practice as it is necessary to develop any analysis in a transversal and interdisciplinary way. It is paid a special attention to the evolution of sound objects and their qualitative approach to ar¬chitecture, showing how sound, with its capacity to transmit time and dynamic aspects of things (movement, the presence of the body), it is not simply another sensorial channel in the apprehension of space, since its interaction with the visual thing generates an undetachable association of space and time, an specific one of every moment and place. Starting from this position a methodological revision has been made leading to use a walk as a tool for analysis that allows to study the relationship among the space, the action and the audio-visual perception, by means of crossing data corresponding to the morphology of space, with the data of a perceptive experience from the perspective of an individual observer and with those of the collective uses of the space, as video has been finally used as a tool, not only as a representation of the real thing, but also as a tool for analysis that allows to take isolated data (audio recordings, video, obser¬vations), to be studied, classified, and put into their appropriate place, and finally, to restore them by means of a multimedia set up. A first experimentation in situ has been carried out, being useful to explore a method of appli¬cation, outlining new questions and beginning with new ways of analysis for further research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Desde hace más de 20 años, muchos grupos de investigación trabajan en el estudio de técnicas de reconocimiento automático de expresiones faciales. En los últimos años, gracias al avance de las metodologías, ha habido numerosos avances que hacen posible una rápida detección de las caras presentes en una imagen y proporcionan algoritmos de clasificación de expresiones. En este proyecto se realiza un estudio sobre el estado del arte en reconocimiento automático de emociones, para conocer los diversos métodos que existen en el análisis facial y en el reconocimiento de la emoción. Con el fin de poder comparar estos métodos y otros futuros, se implementa una herramienta modular y ampliable y que además integra un método de extracción de características que consiste en la obtención de puntos de interés en la cara y dos métodos para clasificar la expresión, uno mediante comparación de desplazamientos de los puntos faciales, y otro mediante detección de movimientos específicos llamados unidades de acción. Para el entrenamiento del sistema y la posterior evaluación del mismo, se emplean las bases de datos Cohn-Kanade+ y JAFFE, de libre acceso a la comunidad científica. Después, una evaluación de estos métodos es llevada a cabo usando diferentes parámetros, bases de datos y variando el número de emociones. Finalmente, se extraen conclusiones del trabajo y su evaluación, proponiendo las mejoras necesarias e investigación futura. ABSTRACT. Currently, many research teams focus on the study of techniques for automatic facial expression recognition. Due to the appearance of digital image processing, in recent years there have been many advances in the field of face detection, feature extraction and expression classification. In this project, a study of the state of the art on automatic emotion recognition is performed to know the different methods existing in facial feature extraction and emotion recognition. To compare these methods, a user friendly tool is implemented. Besides, a feature extraction method is developed which consists in obtaining 19 facial feature points. Those are passed to two expression classifier methods, one based on point displacements, and one based on the recognition of facial Action Units. Cohn-Kanade+ and JAFFE databases, both freely available to the scientific community, are used for system training and evaluation. Then, an evaluation of the methods is performed with different parameters, databases and varying the number of emotions. Finally, conclusions of the work and its evaluation are extracted, proposing some necessary improvements and future research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El presente PFC tiene como objetivo el desarrollo de un gestor domótico basado en el dictado de voz de la red social WhatsApp. Dicho gestor no solo sustituirá el concepto dañino de que la integración de la domótica hoy en día es cara e inservible sino que acercará a aquellas personas con una discapacidad a tener una mejora en la calidad de vida. Estas personas, con un simple comando de voz a su aplicación WhatsApp de su terminal móvil, podrán activar o desactivar todos los elementos domóticos que su vivienda tenga instalados, “activar lámpara”, “encender Horno”, “abrir Puerta”… Todo a un muy bajo precio y utilizando tecnologías OpenSource El objetivo principal de este PFC es ayudar a la gente con una discapacidad a tener mejor calidad de vida, haciéndose independiente en las labores del hogar, ya que será el hogar quien haga las labores. La accesibilidad de este servicio, es por tanto, la mayor de las metas. Para conseguir accesibilidad para todas las personas, se necesita un servicio barato y de fácil aprendizaje. Se elige la red social WhatsApp como interprete, ya que no necesita de formación al ser una aplicación usada mayoritariamente en España y por la capacidad del dictado de voz, y se eligen las tecnologías OpenSource por ser la gran mayoría de ellas gratuitas o de pago solo el hardware. La utilización de la Red social WhatsApp se justifica por sí sola, en septiembre de 2015 se registraron 900 millones de usuarios. Este dato es fruto, también, de la reciente adquisición por parte de Facebook y hace que cumpla el primer requisito de accesibilidad para el servicio domotico que se presenta. Desde hace casi 5 años existe una API liberada de WhatsApp, que la comunidad OpenSource ha utilizado, para crear sus propios clientes o aplicaciones de envío de mensajes, usando la infraestructura de la red social. La empresa no lo aprueba abiertamente, pero la liberación de la API fue legal y su uso también lo es. Por otra parte la empresa se reserva el derecho de bloquear cuentas por el uso fraudulento de su infraestructura. Las tecnologías OpenSource utilizadas han sido, distribuciones Linux (Raspbian) y lenguajes de programación PHP, Python y BASHSCRIPT, todo cubierto por la comunidad, ofreciendo soporte y escalabilidad. Es por ello que se utiliza, como matriz y gestor domotico central, una RaspberryPI. Los servicios que el gestor ofrece en su primera versión incluyen el control domotico de la iluminación eléctrica general o personal, el control de todo tipo de electrodomésticos, el control de accesos para la puerta principal de entrada y el control de medios audiovisuales. ABSTRACT. This final thesis aims to develop a domotic manager based on the speech recognition capacity implemented in the social network, WhatsApp. This Manager not only banish the wrong idea about how expensive and useless is a domotic installation, this manager will give an opportunity to handicapped people to improve their quality of life. These people, with a simple voice command to their own WhatsApp, could enable or disable all the domotics devices installed in their living places. “On Lamp”, “ON Oven”, “Open Door”… This service reduce considerably the budgets because the use of OpenSource Technologies. The main achievement of this thesis is help handicapped people improving their quality of life, making independent from the housework. The house will do the work. The accessibility is, by the way, the goal to achieve. To get accessibility to a width range, we need a cheap, easy to learn and easy to use service. The social Network WhatsApp is one part of the answer, this app does not need explanation because is used all over the world, moreover, integrates the speech recognition capacity. The OpenSource technologies is the other part of the answer due to the low costs or, even, the free costs of their implementations. The use of the social network WhatsApp is explained by itself. In September 2015 were registered around 900 million users, of course, the recent acquisition by Facebook has helped in this astronomic number and match the first law of this service about the accessibility. Since five years exists, in the internet, a free WhatsApp API. The OpenSource community has used this API to develop their own messaging apps or desktop-clients, using the WhatsApp infrastructure. The company does not approve officially, however le API freedom is legal and the use of the API is legal too. On the other hand, the company can block accounts who makes a fraudulent use of his infrastructure. OpenSource technologies used in this thesis are: Linux distributions (Raspbian) and programming languages PHP, Python and BASHCSRIPT, all of these technologies are covered by the community offering support and scalability. Due to that, it is used a RaspberryPI as the Central Domotic Manager. The domotic services that currently this manager achieve are: Domotic lighting control, electronic devices control, access control to the main door and Media Control.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The scientific bases for human-machine communication by voice are in the fields of psychology, linguistics, acoustics, signal processing, computer science, and integrated circuit technology. The purpose of this paper is to highlight the basic scientific and technological issues in human-machine communication by voice and to point out areas of future research opportunity. The discussion is organized around the following major issues in implementing human-machine voice communication systems: (i) hardware/software implementation of the system, (ii) speech synthesis for voice output, (iii) speech recognition and understanding for voice input, and (iv) usability factors related to how humans interact with machines.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argues for a more systematic and scientific approach to investigating spoken input and performance with future language technology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Colloquium on Human-Machine Communication by Voice highlighted the global technical community's focus on the problems and promise of voice-processing technology, particularly, speech recognition and speech synthesis. Clearly, there are many areas in both the research and development of these technologies that can be advanced significantly. However, it is also true that there are many applications of these technologies that are capable of commercialization now. Early successful commercialization of new technology is vital to ensure continuing interest in its development. This paper addresses efforts to commercialize speech technologies in two markets: telecommunications and aids for the handicapped.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will require a varying mix of advances in speech technology and in integration of the technology into applications environments. Applications that are described include (1) speech recognition and synthesis for mobile command and control; (2) speech processing for a portable multifunction soldier's computer; (3) speech- and language-based technology for naval combat team tactical training; (4) speech technology for command and control on a carrier flight deck; (5) control of auxiliary systems, and alert and warning generation, in fighter aircraft and helicopters; and (6) voice check-in, report entry, and communication for law enforcement agents or special forces. A phased approach for transfer of the technology into applications is advocated, where integration of applications systems is pursued in parallel with advanced research to meet future needs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The deployment of systems for human-to-machine communication by voice requires overcoming a variety of obstacles that affect the speech-processing technologies. Problems encountered in the field might include variation in speaking style, acoustic noise, ambiguity of language, or confusion on the part of the speaker. The diversity of these practical problems encountered in the "real world" leads to the perceived gap between laboratory and "real-world" performance. To answer the question "What applications can speech technology support today?" the concept of the "degree of difficulty" of an application is introduced. The degree of difficulty depends not only on the demands placed on the speech recognition and speech synthesis technologies but also on the expectations of the user of the system. Experience has shown that deployment of effective speech communication systems requires an iterative process. This paper discusses general deployment principles, which are illustrated by several examples of human-machine communication systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the state of the art in applications of voice-processing technologies. In the first part, technologies concerning the implementation of speech recognition and synthesis algorithms are described. Hardware technologies such as microprocessors and DSPs (digital signal processors) are discussed. Software development environment, which is a key technology in developing applications software, ranging from DSP software to support software also is described. In the second part, the state of the art of algorithms from the standpoint of applications is discussed. Several issues concerning evaluation of speech recognition/synthesis algorithms are covered, as well as issues concerning the robustness of algorithms in adverse conditions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This talk, which was the keynote address of the NAS Colloquium on Human-Machine Communication by Voice, discusses the past, present, and future of human-machine communications, especially speech recognition and speech synthesis. Progress in these technologies is reviewed in the context of the general progress in computer and communications technologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A perda auditiva no idoso acarreta em dificuldade na percepção da fala. O teste comumente utilizado na logoaudiometria é a pesquisa do índice de reconhecimento de fala máximo (IR-Max) em uma única intensidade de apresentação da fala. Entretanto, o procedimento mais adequado seria a realização do teste em diversas intensidades, visto que o índice de acerto depende da intensidade da fala no momento do teste e está relacionado com o grau e configuração da perda auditiva. A imprecisão na obtenção do IR-Max poderá gerar uma hipótese diagnóstica errônea e o insucesso no processo de intervenção na perda auditiva. Objetivo: Verificar a interferência do nível de apresentação da fala, no teste de reconhecimento de fala, em idosos com perda auditiva sensorioneural com diferentes configurações audiométricas. Métodos: Participaram 64 idosos, 120 orelhas (61 do gênero feminino e 59 do gênero masculino), idade entre 60 e 88 anos, divididos em grupos: G1- composto por 23 orelhas com configuração horizontal, G2- 55 orelhas com configuração descendente, G3- 42 orelhas com configuração abrupta. Os critérios de inclusão foram: perda auditiva sensorioneural de grau leve a severo, não usuário de aparelho de amplificação sonora individual (AASI), ou com tempo de uso inferior a dois meses, e ausência de alterações cognitivas. Foram realizados os seguintes procedimentos: pesquisas do limiar de reconhecimento de fala (LRF), do índice de reconhecimento de fala (IRF) em diversas intensidades e do nível de máximo conforto (MCL) e desconforto (UCL) para a fala. Para tal, foram utilizadas listas com 11 monossílabos, para diminuir a duração do teste. A análise estatística foi composta pelo teste Análise de Variância (ANOVA) e teste de Tukey. Resultados: A configuração descendente foi a de maior ocorrência. Indivíduos com configuração horizontal apresentaram índice médio de acerto mais elevado de reconhecimento de fala. Ao considerar o total avaliado, 27,27% dos indivíduos com configuração horizontal revelaram o IR-Max no MCL, assim como 38,18% com configuração descendente e 26,19% com configuração abrupta. O IR-Max foi encontrado no UCL, em 40,90% dos indivíduos com configuração horizontal, 45,45% com configuração descendente e 28,20% com configuração abrupta. Respectivamente, o maior e o menor índice médio de acerto foram encontrados em: G1- 30 e 40 dBNS; G2- 50 e 10 dBNS; G3- 45 e 10 dBNS. Não há uma única intensidade de fala a ser utilizada em todos os tipos de configurações audiométricas, entretanto, os níveis de sensação que identificaram os maiores índices médios de acerto foram: G1- 20 a 30 dBNS, G2- 20 a 50 dBNS; G3- 45 dBNS. O MCL e o UCL-5 dB para a fala não foram eficazes para determinar o IR-Max. Conclusões: O nível de apresentação teve influência no desempenho no reconhecimento de fala para monossílabos em idosos com perda auditiva sensorioneural em todas as configurações audiométricas. A perda auditiva de grau moderado e a configuração audiométrica descendente foram mais frequentes nessa população, seguida da abrupta e horizontal.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A televisão nos dias atuais tem sofrido inúmeras inovações tecnológicas nos campos das transmissões multimídia, qualidade audio-visual e diversidade de funcionalidades. Entretanto, esta essencialmente mantêm sua característica de fornecer informações de forma quase que instantânea à população. O ambiente atual da televisão digital é caracterizado pela coexistência de inúmeros dispositivos capazes de oferecerem uma experiência televisa, associando-se computadores pessoais, smartphones, tablets e outros eletrônicos de consumo. Ainda, pode se incluir a este cenário a disponibilidade de inúmeras redes de transporte de dados tais como a radiodifusão, satélite, cabo e redes em banda larga. Este cenário diversificado, em termos de dispositivos e redes, é denominado de cenário de televisão digital híbrida, a qual destaca-se a interação do expectador com os diversos dispositivos. Estes cenários, por sua vez, motivam o desenvolvimento de tecnologias que permitem o aperfeiçoamento da pervasividade e dos meios pelos os quais os aplicativos possam ser suportados em diferentes plataformas. Este trabalho propõe ambientes interoperáveis envolvendo a televisão digital interativa e outros eletrônicos de consumo, aos quais foram realizados estudos e experimentos para se observar diferentes técnicas de sincronização e comunicação entre plataformas de interatividade para a televisão digital híbrida. Os resultados apontam para a possibilidade de cenários interoperáveis envolvendo o uso de marcadores e também recursos de redes e serviços TCP/IP, levando em consideração a eficiência e eficácia nos diferentes métodos. Conclui-se que os resultados odem motivar o desenvolvimento de cenários diferenciados envolvendo a televisão digital interativa e dispositivos de segunda tela, o que incrementa a interatividade e as formas de entretenimento.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

From a gender perspective, protection and advertising political actions about work-family should promote sharing responsibilities between sexes. Next to political action and specific measures, the project of equal opportunities needs a long-term strategy based on the education on equality. This article proposes the methodologic exposition of a study based on these premises. It facilitates and explains the protocol used for the analysis of the audio-visual advertising campaigns on conciliation emitted by the Woman’s Institute. The evaluation of the actions is focused on the effectiveness from the point of view of mass media. It provides some data that illustrates the proposed study. Finally, it considers the difficulties of the available sources of information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we introduce a probabilistic approach to support visual supervision and gesture recognition. Task knowledge is both of geometric and visual nature and it is encoded in parametric eigenspaces. Learning processes for compute modal subspaces (eigenspaces) are the core of tracking and recognition of gestures and tasks. We describe the overall architecture of the system and detail learning processes and gesture design. Finally we show experimental results of tracking and recognition in block-world like assembling tasks and in general human gestures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

But: La perte unilatérale du cortex visuel postérieur engendre une cécité corticale controlatérale à la lésion, qu’on appelle hémianopsie homonyme (HH). Celle-ci est notamment accompagnée de problèmes d’exploration visuelle dans l’hémichamp aveugle dus à des stratégies oculaires déficitaires, qui ont été la cible des thérapies de compensation. Or, cette perte de vision peut s’accompagner d’une perception visuelle inconsciente, appelée blindsight. Notre hypothèse propose que le blindsight soit médié par la voie rétino-colliculaire extrastriée, recrutant le colliculus supérieur (CS), une structure multisensorielle. Notre programme a pour objectif d’évaluer l’impact d’un entraînement multisensoriel (audiovisuel) sur la performance visuelle inconsciente des personnes hémianopsiques et les stratégies oculaires. Nous essayons, ainsi, de démontrer l’implication du CS dans le phénomène de blindsight et la pertinence de la technique de compensation multisensorielle comme thérapie de réadaptation. Méthode: Notre participante, ML, atteinte d’une HH droite a effectué un entraînement d’intégration audiovisuel pour une période de 10 jours. Nous avons évalué la performance visuelle en localisation et en détection ainsi que les stratégies oculaires selon trois comparaisons principales : (1) entre l’hémichamp normal et l’hémichamp aveugle; (2) entre la condition visuelle et les conditions audiovisuelles; (3) entre les sessions de pré-entraînement, post-entraînement et 3 mois post-entraînement. Résultats: Nous avons démontré que (1) les caractéristiques des saccades et des fixations sont déficitaires dans l’hémichamp aveugle; (2) les stratégies saccadiques diffèrent selon les excentricités et les conditions de stimulations; (3) une adaptation saccadique à long terme est possible dans l’hémichamp aveugle si l’on considère le bon cadre de référence; (4) l’amélioration des mouvements oculaires est liée au blindsight. Conclusion(s): L’entraînement multisensoriel conduit à une amélioration de la performance visuelle pour des cibles non perçues, tant en localisation qu’en détection, ce qui est possiblement induit par le développement de la performance oculomotrice.