903 resultados para hand-drawn visual language recognition


Relevância:

30.00% 30.00%

Publicador:

Resumo:

When the illumination of a visual scene changes, the quantity of light reflected from objects is altered. Despite this, the perceived lightness of the objects generally remains constant. This perceptual lightness constancy is thought to be important behaviorally for object recognition. Here we show that interactions from outside the classical receptive fields of neurons in primary visual cortex modulate neural responses in a way that makes them immune to changes in illumination, as is perception. This finding is consistent with the hypothesis that the responses of neurons in primary visual cortex carry information about surface lightness in addition to information about form. It also suggests that lightness constancy, which is sometimes thought to involve “higher-level” processes, is manifest at the first stage of visual cortical processing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper provides an overview of the colloquium's discussion session on natural language understanding, which followed presentations by M. Bates [Bates, M. (1995) Proc. Natl. Acad. Sci. USA 92, 9977-9982] and R. C. Moore [Moore, R. C. (1995) Proc. Natl. Acad. Sci. USA 92, 9983-9988]. The paper reviews the dual role of language processing in providing understanding of the spoken input and an additional source of constraint in the recognition process. To date, language processing has successfully provided understanding but has provided only limited (and computationally expensive) constraint. As a result, most current systems use a loosely coupled, unidirectional interface, such as N-best or a word network, with natural language constraints as a postprocess, to filter or resort the recognizer output. However, the level of discourse context provides significant constraint on what people can talk about and how things can be referred to; when the system becomes an active participant, it can influence this order. But sources of discourse constraint have not been extensively explored, in part because these effects can only be seen by studying systems in the context of their use in interactive problem solving. This paper argues that we need to study interactive systems to understand what kinds of applications are appropriate for the current state of technology and how the technology can move from the laboratory toward real applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The integration of speech recognition with natural language understanding raises issues of how to adapt natural language processing to the characteristics of spoken language; how to cope with errorful recognition output, including the use of natural language information to reduce recognition errors; and how to use information from the speech signal, beyond just the sequence of words, as an aid to understanding. This paper reviews current research addressing these questions in the Spoken Language Program sponsored by the Advanced Research Projects Agency (ARPA). I begin by reviewing some of the ways that spontaneous spoken language differs from standard written language and discuss methods of coping with the difficulties of spontaneous speech. I then look at how systems cope with errors in speech recognition and at attempts to use natural language information to reduce recognition errors. Finally, I discuss how prosodic information in the speech signal might be used to improve understanding.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A imagem mental e a memória visual têm sido consideradas como componentes distintos na codificação da informação, e associados a processos diferentes da memória de trabalho. Evidências experimentais mostram, por exemplo, que o desempenho em tarefas de memória baseadas na geração de imagem mentais (imaginação visual) sofre a interferência do ruído visual dinâmico (RVD), mas não se observa o mesmo efeito em tarefas de memória visual baseadas na percepção visual (memória visual). Embora várias evidências mostrem que tarefas de imaginação e de memória visual sejam baseadas em processos cognitivos diferentes, isso não descarta a possibilidade de utilizarem também processos em comum e que alguns resultados experimentais que apontam diferenças entre as duas tarefas resultem de diferenças metodológicas entre os paradigmas utilizados para estuda-las. Nosso objetivo foi equiparar as tarefas de imagem mental visual e memória visual por meio de tarefas de reconhecimento, com o paradigma de dicas retroativas espaciais. Sequências de letras romanas na forma visual (tarefa de memória visual) e acústicas (tarefa de imagem mental visual) foram apresentadas em quatro localizações espaciais diferentes. No primeiro e segundo experimento analisou-se o tempo do curso de recuperação tanto para o processo de imagem quanto para o processo de memória. No terceiro experimento, comparou-se a estrutura das representações dos dois componentes, por meio da apresentação do RVD durante a etapa de geração e recuperação. Nossos resultados mostram que não há diferenças no armazenamento da informação visual durante o período proposto, porém o RVD afeta a eficiência do processo de recuperação, isto é o tempo de resposta, sendo a representação da imagem mental visual mais suscetível ao ruído. No entanto, o processo temporal da recuperação é diferente para os dois componentes, principalmente para imaginação que requer mais tempo para recuperar a informação do que a memória. Os dados corroboram a relevância do paradigma de dicas retroativas que indica que a atenção espacial é requisitada em representações de organização espacial, independente se são visualizadas ou imaginadas.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Actualmente existe una gran cantidad de empresas ofreciendo servicios para el análisis de contenido y minería de datos de las redes sociales con el objetivo de realizar análisis de opiniones y gestión de la reputación. Un alto porcentaje de pequeñas y medianas empresas (pymes) ofrecen soluciones específicas a un sector o dominio industrial. Sin embargo, la adquisición de la necesaria tecnología básica para ofrecer tales servicios es demasiado compleja y constituye un sobrecoste demasiado alto para sus limitados recursos. El objetivo del proyecto europeo OpeNER es la reutilización y desarrollo de componentes y recursos para el procesamiento lingüístico que proporcione la tecnología necesaria para su uso industrial y/o académico.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tactile sensors play an important role in robotics manipulation to perform dexterous and complex tasks. This paper presents a novel control framework to perform dexterous manipulation with multi-fingered robotic hands using feedback data from tactile and visual sensors. This control framework permits the definition of new visual controllers which allow the path tracking of the object motion taking into account both the dynamics model of the robot hand and the grasping force of the fingertips under a hybrid control scheme. In addition, the proposed general method employs optimal control to obtain the desired behaviour in the joint space of the fingers based on an indicated cost function which determines how the control effort is distributed over the joints of the robotic hand. Finally, authors show experimental verifications on a real robotic manipulation system for some of the controllers derived from the control framework.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper addresses the problem of the automatic recognition and classification of temporal expressions and events in human language. Efficacy in these tasks is crucial if the broader task of temporal information processing is to be successfully performed. We analyze whether the application of semantic knowledge to these tasks improves the performance of current approaches. We therefore present and evaluate a data-driven approach as part of a system: TIPSem. Our approach uses lexical semantics and semantic roles as additional information to extend classical approaches which are principally based on morphosyntax. The results obtained for English show that semantic knowledge aids in temporal expression and event recognition, achieving an error reduction of 59% and 21%, while in classification the contribution is limited. From the analysis of the results it may be concluded that the application of semantic knowledge leads to more general models and aids in the recognition of temporal entities that are ambiguous at shallower language analysis levels. We also discovered that lexical semantics and semantic roles have complementary advantages, and that it is useful to combine them. Finally, we carried out the same analysis for Spanish. The results obtained show comparable advantages. This supports the hypothesis that applying the proposed semantic knowledge may be useful for different languages.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

International conference presentations represent one of the biggest challenges for academics using English as a Lingua Franca (ELF). This paper aims to initiate exploration into the multimodal academic discourse of oral presentations, including the verbal, written, non-verbal material (NVM) and body language modes. It offers a Systemic Functional Linguistic (SFL) and multimodal framework of presentations to enhance mixed-disciplinary ELF academics' awareness of what needs to be taken into account to communicate effectively at conferences. The model is also used to establish evaluation criteria for the presenters' talks and to carry out a multimodal discourse analysis of four well-rated 20-min talks, two from the technical sciences and two from the social sciences in a workshop scenario. The findings from the analysis and interviews indicate that: (a) a greater awareness of the mode affordances and their combinations can lead to improved performances; (b) higher reliance on the visual modes can compensate for verbal deficiencies; and (c) effective speakers tend to use a variety of modes that often overlap but work together to convey specific meanings. However, firm conclusions cannot be drawn on the basis of workshop presentations, and further studies on the multimodal analysis of ‘real conferences’ within specific disciplines are encouraged.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this project, we propose the implementation of a 3D object recognition system which will be optimized to operate under demanding time constraints. The system must be robust so that objects can be recognized properly in poor light conditions and cluttered scenes with significant levels of occlusion. An important requirement must be met: the system must exhibit a reasonable performance running on a low power consumption mobile GPU computing platform (NVIDIA Jetson TK1) so that it can be integrated in mobile robotics systems, ambient intelligence or ambient assisted living applications. The acquisition system is based on the use of color and depth (RGB-D) data streams provided by low-cost 3D sensors like Microsoft Kinect or PrimeSense Carmine. The range of algorithms and applications to be implemented and integrated will be quite broad, ranging from the acquisition, outlier removal or filtering of the input data and the segmentation or characterization of regions of interest in the scene to the very object recognition and pose estimation. Furthermore, in order to validate the proposed system, we will create a 3D object dataset. It will be composed by a set of 3D models, reconstructed from common household objects, as well as a handful of test scenes in which those objects appear. The scenes will be characterized by different levels of occlusion, diverse distances from the elements to the sensor and variations on the pose of the target objects. The creation of this dataset implies the additional development of 3D data acquisition and 3D object reconstruction applications. The resulting system has many possible applications, ranging from mobile robot navigation and semantic scene labeling to human-computer interaction (HCI) systems based on visual information.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although frequently discarded and despised in the 20th century, translation now seems to find wider acceptance within the Second Language Teaching (SLT) field. However, it still has a long way to go before recovering its due place in the L2 classroom. The aim of this paper is to suggest a number of translation (and interpreting)-based activities covering the different competence levels, thus showing that communicative content and translation can perfectly go hand in hand so that old, unjustified prejudices can be superseded once and for all.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis explores the role of multimodality in language learners’ comprehension, and more specifically, the effects on students’ audio-visual comprehension when different orchestrations of modes appear in the visualization of vodcasts. Firstly, I describe the state of the art of its three main areas of concern, namely the evolution of meaning-making, Information and Communication Technology (ICT), and audio-visual comprehension. One of the most important contributions in the theoretical overview is the suggested integrative model of audio-visual comprehension, which attempts to explain how students process information received from different inputs. Secondly, I present a study based on the following research questions: ‘Which modes are orchestrated throughout the vodcasts?’, ‘Are there any multimodal ensembles that are more beneficial for students’ audio-visual comprehension?’, and ‘What are the students’ attitudes towards audio-visual (e.g., vodcasts) compared to traditional audio (e.g., audio tracks) comprehension activities?’. Along with these research questions, I have formulated two hypotheses: Audio-visual comprehension improves when there is a greater number of orchestrated modes, and students have a more positive attitude towards vodcasts than traditional audios when carrying out comprehension activities. The study includes a multimodal discourse analysis, audio-visual comprehension tests, and students’ questionnaires. The multimodal discourse analysis of two British Council’s language learning vodcasts, entitled English is GREAT and Camden Fashion, using ELAN as the multimodal annotation tool, shows that there are a variety of multimodal ensembles of two, three and four modes. The audio-visual comprehension tests were given to 40 Spanish students, learning English as a foreign language, after the visualization of vodcasts. These comprehension tests contain questions related to specific orchestrations of modes appearing in the vodcasts. The statistical analysis of the test results, using repeated-measures ANOVA, reveal that students obtain better audio-visual comprehension results when the multimodal ensembles are constituted by a greater number of orchestrated modes. Finally, the data compiled from the questionnaires, conclude that students have a more positive attitude towards vodcasts in comparison to traditional audio listenings. Results from the audio-visual comprehension tests and questionnaires prove the two hypotheses of this study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article analyses the way in which the subject English Language V of the degree English Studies (English Language and Literature) combines the development of the five skills (listening, speaking, reading, writing and interacting) with the use of multimodal activities and resources in the teaching-learning process so that students increase their motivation and acquire different social competences that will be useful for the labour market such as communication, cooperation, leadership or conflict management. This study highlights the use of multimodal materials (texts, videos, etc.) on social topics to introduce cultural aspects in a language subject and to deepen into the different social competences university students can acquire when they work with them. The study was guided by the following research questions: how can multimodal texts and resources contribute to the development of the five skills in a foreign language classroom? What are the main social competences that students acquire when the teaching-learning process is multimodal? The results of a survey prepared at the end of the academic year 2015-2016 point out the main competences that university students develop thanks to multimodal teaching. For its framework of analysis, the study draws on the main principles of visual grammar (Kress & van Leeuwen, 2006) where students learn how to analyse the main aspects in multimodal texts. The analysis of the different multimodal activities described in the article and the survey reveal that multimodality is useful for developing critical thinking, for bringing cultural aspects into the classroom and for working on social competences. This article will explain the successes and challenges of using multimodal texts with social content so that students can acquire social competences while learning content. Moreover, the implications of using multimodal resources in a language classroom to develop multiliteracies will be observed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tese de doutoramento, Estudos Artísticos (Estudos de Teatro), Universidade de Lisboa, Faculdade de Letras, 2016