907 resultados para audio-visual information
Resumo:
People possess different sensory modalities to detect, interpret, and efficiently act upon various events in a complex and dynamic environment (Fetsch, DeAngelis, & Angelaki, 2013). Much empirical work has been done to understand the interplay of modalities (e.g. audio-visual interactions, see Calvert, Spence, & Stein, 2004). On the one hand, integration of multimodal input as a functional principle of the brain enables the versatile and coherent perception of the environment (Lewkowicz & Ghazanfar, 2009). On the other hand, sensory integration does not necessarily mean that input from modalities is always weighted equally (Ernst, 2008). Rather, when two or more modalities are stimulated concurrently, one often finds one modality dominating over another. Study 1 and 2 of the dissertation addressed the developmental trajectory of sensory dominance. In both studies, 6-year-olds, 9-year-olds, and adults were tested in order to examine sensory (audio-visual) dominance across different age groups. In Study 3, sensory dominance was put into an applied context by examining verbal and visual overshadowing effects among 4- to 6-year olds performing a face recognition task. The results of Study 1 and Study 2 support default auditory dominance in young children as proposed by Napolitano and Sloutsky (2004) that persists up to 6 years of age. For 9-year-olds, results on privileged modality processing were inconsistent. Whereas visual dominance was revealed in Study 1, privileged auditory processing was revealed in Study 2. Among adults, a visual dominance was observed in Study 1, which has also been demonstrated in preceding studies (see Spence, Parise, & Chen, 2012). No sensory dominance was revealed in Study 2 for adults. Potential explanations are discussed. Study 3 referred to verbal and visual overshadowing effects in 4- to 6-year-olds. The aim was to examine whether verbalization (i.e., verbally describing a previously seen face), or visualization (i.e., drawing the seen face) might affect later face recognition. No effect of visualization on recognition accuracy was revealed. As opposed to a verbal overshadowing effect, a verbal facilitation effect occurred. Moreover, verbal intelligence was a significant predictor for recognition accuracy in the verbalization group but not in the control group. This suggests that strengthening verbal intelligence in children can pay off in non-verbal domains as well, which might have educational implications.
Resumo:
This thesis explores the role of multimodality in language learners’ comprehension, and more specifically, the effects on students’ audio-visual comprehension when different orchestrations of modes appear in the visualization of vodcasts. Firstly, I describe the state of the art of its three main areas of concern, namely the evolution of meaning-making, Information and Communication Technology (ICT), and audio-visual comprehension. One of the most important contributions in the theoretical overview is the suggested integrative model of audio-visual comprehension, which attempts to explain how students process information received from different inputs. Secondly, I present a study based on the following research questions: ‘Which modes are orchestrated throughout the vodcasts?’, ‘Are there any multimodal ensembles that are more beneficial for students’ audio-visual comprehension?’, and ‘What are the students’ attitudes towards audio-visual (e.g., vodcasts) compared to traditional audio (e.g., audio tracks) comprehension activities?’. Along with these research questions, I have formulated two hypotheses: Audio-visual comprehension improves when there is a greater number of orchestrated modes, and students have a more positive attitude towards vodcasts than traditional audios when carrying out comprehension activities. The study includes a multimodal discourse analysis, audio-visual comprehension tests, and students’ questionnaires. The multimodal discourse analysis of two British Council’s language learning vodcasts, entitled English is GREAT and Camden Fashion, using ELAN as the multimodal annotation tool, shows that there are a variety of multimodal ensembles of two, three and four modes. The audio-visual comprehension tests were given to 40 Spanish students, learning English as a foreign language, after the visualization of vodcasts. These comprehension tests contain questions related to specific orchestrations of modes appearing in the vodcasts. The statistical analysis of the test results, using repeated-measures ANOVA, reveal that students obtain better audio-visual comprehension results when the multimodal ensembles are constituted by a greater number of orchestrated modes. Finally, the data compiled from the questionnaires, conclude that students have a more positive attitude towards vodcasts in comparison to traditional audio listenings. Results from the audio-visual comprehension tests and questionnaires prove the two hypotheses of this study.
Resumo:
In this report we summarize the state-of-the-art of speech emotion recognition from the signal processing point of view. On the bases of multi-corporal experiments with machine-learning classifiers, the observation is made that existing approaches for supervised machine learning lead to database dependent classifiers which can not be applied for multi-language speech emotion recognition without additional training because they discriminate the emotion classes following the used training language. As there are experimental results showing that Humans can perform language independent categorisation, we made a parallel between machine recognition and the cognitive process and tried to discover the sources of these divergent results. The analysis suggests that the main difference is that the speech perception allows extraction of language independent features although language dependent features are incorporated in all levels of the speech signal and play as a strong discriminative function in human perception. Based on several results in related domains, we have suggested that in addition, the cognitive process of emotion-recognition is based on categorisation, assisted by some hierarchical structure of the emotional categories, existing in the cognitive space of all humans. We propose a strategy for developing language independent machine emotion recognition, related to the identification of language independent speech features and the use of additional information from visual (expression) features.
Resumo:
No existe en Cuenca un proyecto de investigación periodística y de producción audiovisual que indague, recopile y presente información sobre aquellas profesiones tradicionales heredadas a través del tiempo y que poco a poco se van perdiendo con miras a extinguirse completamente. Este proyecto, de cierta manera, puede ser innovador, ya que involucra dos áreas: comunicación audiovisual y redacción dentro del periodismo. Se involucran por el hecho de presentar información relevante, a través de un producto final, visual y escrito, que enseñe de quéforma estas profesiones son desarrolladas por diferentes actores humanos, sus contextos y sus procesos, con la intención de servir de apoyo investigativo cultural en el ámbito local y nacional.
Resumo:
This work presents a hybrid coordinated manoeuvre for docking an autonomous surface vehicle with an autonomous underwater vehicle. The control manoeuvre uses visual information to estimate the AUV relative position and attitude in relation to the ASV and steers the ASV in order to dock with the AUV. The AUV is assumed to be at surface with only a small fraction of its volume visible. The system implemented in the autonomous surface vehicle ROAZ, developed by LSA-ISEP to perform missions in river environment, test autonomous AUV docking capabilities and multiple AUV/ASV coordinated missions is presented. Information from a low cost embedded robotics vision system (LSAVision), along with inertial navigation sensors is fused in an extended Kalman filter and used to determine AUV relative position and orientation to the surface vehicle The real time vision processing system is described and results are presented in operational scenario.
Resumo:
In the early nineties, Mark Weiser wrote a series of seminal papers that introduced the concept of Ubiquitous Computing. According to Weiser, computers require too much attention from the user, drawing his focus from the tasks at hand. Instead of being the centre of attention, computers should be so natural that they would vanish into the human environment. Computers become not only truly pervasive but also effectively invisible and unobtrusive to the user. This requires not only for smaller, cheaper and low power consumption computers, but also for equally convenient display solutions that can be harmoniously integrated into our surroundings. With the advent of Printed Electronics, new ways to link the physical and the digital worlds became available. By combining common printing techniques such as inkjet printing with electro-optical functional inks, it is starting to be possible not only to mass-produce extremely thin, flexible and cost effective electronic circuits but also to introduce electronic functionalities into products where it was previously unavailable. Indeed, Printed Electronics is enabling the creation of novel sensing and display elements for interactive devices, free of form factor. At the same time, the rise in the availability and affordability of digital fabrication technologies, namely of 3D printers, to the average consumer is fostering a new industrial (digital) revolution and the democratisation of innovation. Nowadays, end-users are already able to custom design and manufacture on demand their own physical products, according to their own needs. In the future, they will be able to fabricate interactive digital devices with user-specific form and functionality from the comfort of their homes. This thesis explores how task-specific, low computation, interactive devices capable of presenting dynamic visual information can be created using Printed Electronics technologies, whilst following an approach based on the ideals behind Personal Fabrication. Focus is given on the use of printed electrochromic displays as a medium for delivering dynamic digital information. According to the architecture of the displays, several approaches are highlighted and categorised. Furthermore, a pictorial computation model based on extended cellular automata principles is used to programme dynamic simulation models into matrix-based electrochromic displays. Envisaged applications include the modelling of physical, chemical, biological, and environmental phenomena.
Resumo:
In the Morris water maze (MWM) task, proprioceptive information is likely to have a poor accuracy due to movement inertia. Hence, in this condition, dynamic visual information providing information on linear and angular acceleration would play a critical role in spatial navigation. To investigate this assumption we compared rat's spatial performance in the MWM and in the homing hole board (HB) tasks using a 1.5 Hz stroboscopic illumination. In the MWM, rats trained in the stroboscopic condition needed more time than those trained in a continuous light condition to reach the hidden platform. They expressed also little accuracy during the probe trial. In the HB task, in contrast, place learning remained unaffected by the stroboscopic light condition. The deficit in the MWM was thus complete, affecting both escape latency and discrimination of the reinforced area, and was thus task specific. This dissociation confirms that dynamic visual information is crucial to spatial navigation in the MWM whereas spatial navigation on solid ground is mediated by a multisensory integration, and thus less dependent on visual information.
Resumo:
The purpose of this project was to identify in a subject group of engineers and technicians (N = 62) a preferred mode of representation for facilitating correct recall of information from complex graphics. The modes of representation were black and white (b&w) block, b&w icon, color block, and color icon. The researcher's test instrument included twelve complex graphics (six b&w and six color - three per mode). Each graphics presentation was followed by two multiple-choice questions. Recall performance was better using b&w block mode graphics and color icon mode graphics. A standardized test, the Group Embedded Figures Test (GEFT) was used to identify a cognitive style preference (field dependence). Although engineers and technicians in the sample were strongly field-independent, they were not significantly more field-independent than the normative group in the Witkin, Oltman, Raskin, and Karp study (1971). Tests were also employed to look for any significant difference in cognitive style preference due to gender. None was found. Implications from the project results for the design of visuals and their use in technical training are discussed.
Resumo:
Les stimuli naturels projetés sur nos rétines nous fournissent de l’information visuelle riche. Cette information varie le long de propriétés de « bas niveau » telles que la luminance, le contraste, et les fréquences spatiales. Alors qu’une partie de cette information atteint notre conscience, une autre partie est traitée dans le cerveau sans que nous en soyons conscients. Les propriétés de l’information influençant l’activité cérébrale et le comportement de manière consciente versus non-consciente demeurent toutefois peu connues. Cette question a été examinée dans les deux derniers articles de la présente thèse, en exploitant les techniques psychophysiques développées dans les deux premiers articles. Le premier article présente la boîte à outils SHINE (spectrum, histogram, and intensity normalization and equalization), développée afin de permettre le contrôle des propriétés de bas niveau de l'image dans MATLAB. Le deuxième article décrit et valide la technique dite des bulles fréquentielles, qui a été utilisée tout au long des études de cette thèse pour révéler les fréquences spatiales utilisées dans diverses tâches de perception des visages. Cette technique offre les avantages d’une haute résolution au niveau des fréquences spatiales ainsi que d’un faible biais expérimental. Le troisième et le quatrième article portent sur le traitement des fréquences spatiales en fonction de la conscience. Dans le premier cas, la méthode des bulles fréquentielles a été utilisée avec l'amorçage par répétition masquée dans le but d’identifier les fréquences spatiales corrélées avec les réponses comportementales des observateurs lors de la perception du genre de visages présentés de façon consciente versus non-consciente. Les résultats montrent que les mêmes fréquences spatiales influencent de façon significative les temps de réponse dans les deux conditions de conscience, mais dans des sens opposés. Dans le dernier article, la méthode des bulles fréquentielles a été combinée à des enregistrements intracrâniens et au Continuous Flash Suppression (Tsuchiya & Koch, 2005), dans le but de cartographier les fréquences spatiales qui modulent l'activation de structures spécifiques du cerveau (l'insula et l'amygdale) lors de la perception consciente versus non-consciente des expressions faciales émotionnelles. Dans les deux régions, les résultats montrent que la perception non-consciente s'effectue plus rapidement et s’appuie davantage sur les basses fréquences spatiales que la perception consciente. La contribution de cette thèse est donc double. D’une part, des contributions méthodologiques à la recherche en perception visuelle sont apportées par l'introduction de la boîte à outils SHINE ainsi que de la technique des bulles fréquentielles. D’autre part, des indications sur les « corrélats de la conscience » sont fournies à l’aide de deux approches différentes.