194 resultados para Visual Information

em Queensland University of Technology - ePrints Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recovering position from sensor information is an important problem in mobile robotics, known as localisation. Localisation requires a map or some other description of the environment to provide the robot with a context to interpret sensor data. The mobile robot system under discussion is using an artificial neural representation of position. Building a geometrical map of the environment with a single camera and artificial neural networks is difficult. Instead it would be simpler to learn position as a function of the visual input. Usually when learning images, an intermediate representation is employed. An appropriate starting point for biologically plausible image representation is the complex cells of the visual cortex, which have invariance properties that appear useful for localisation. The effectiveness for localisation of two different complex cell models are evaluated. Finally the ability of a simple neural network with single shot learning to recognise these representations and localise a robot is examined.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A pressing concern within the literature on anticipatory perceptual-motor behaviour is the lack of clarity on the applicability of data, observed under video-simulation task constraints, to actual performance in which actions are coupled to perception, as captured during in-situ experimental conditions. We developed an in-situ experimental paradigm which manipulated the duration of anticipatory visual information from a penalty taker’s actions to examine experienced goalkeepers’ vulnerability to deception for the penalty kick in association football. Irrespective of the penalty taker’s kick strategy, goalkeepers initiated movement responses earlier across consecutively earlier presentation points. Overall goalkeeping performance was better in non-deception trials than in deception conditions. In deception trials, the kinematic information presented up until the penalty taker initiated his/her kicking action had a negative effect on goalkeepers’ performance. It is concluded that goalkeepers are likely to benefit from not anticipating a penalty taker’s performance outcome based on information from the run-up, in preference to later information that emerges just before the initiation of the penalty taker’s kicking action.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The integration of separate, yet complimentary, cortical pathways appears to play a role in visual perception and action when intercepting objects. The ventral system is responsible for object recognition and identification, while the dorsal system facilitates continuous regulation of action. This dual-system model implies that empirically manipulating different visual information sources during performance of an interceptive action might lead to the emergence of distinct gaze and movement pattern profiles. To test this idea, we recorded hand kinematics and eye movements of participants as they attempted to catch balls projected from a novel apparatus that synchronised or de-synchronised accompanying video images of a throwing action and ball trajectory. Results revealed that ball catching performance was less successful when patterns of hand movements and gaze behaviours were constrained by the absence of advanced perceptual information from the thrower's actions. Under these task constraints, participants began tracking the ball later, followed less of its trajectory, and adapted their actions by initiating movements later and moving the hand faster. There were no performance differences when the throwing action image and ball speed were synchronised or de-synchronised since hand movements were closely linked to information from ball trajectory. Results are interpreted relative to the two-visual system hypothesis, demonstrating that accurate interception requires integration of advanced visual information from kinematics of the throwing action and from ball flight trajectory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article content analyzes music in tourism TV commercials from 95 regions and countries to identify their general acoustic characteristics. The objective is to offer a general guideline in the postproduction of tourism TV commercials. It is found that tourism TV commercials tend to be produced in a faster tempo with beats per minute close to 120, which is rare to be found in general TV commercials. To compensate for the faster tempo (increased aural information load), less scenes (longer duration per scene) were edited into the footage. Production recommendations and future research are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To identify and categorize complex stimuli such as familiar objects or speech, the human brain integrates information that is abstracted at multiple levels from its sensory inputs. Using cross-modal priming for spoken words and sounds, this functional magnetic resonance imaging study identified 3 distinct classes of visuoauditory incongruency effects: visuoauditory incongruency effects were selective for 1) spoken words in the left superior temporal sulcus (STS), 2) environmental sounds in the left angular gyrus (AG), and 3) both words and sounds in the lateral and medial prefrontal cortices (IFS/mPFC). From a cognitive perspective, these incongruency effects suggest that prior visual information influences the neural processes underlying speech and sound recognition at multiple levels, with the STS being involved in phonological, AG in semantic, and mPFC/IFS in higher conceptual processing. In terms of neural mechanisms, effective connectivity analyses (dynamic causal modeling) suggest that these incongruency effects may emerge via greater bottom-up effects from early auditory regions to intermediate multisensory integration areas (i.e., STS and AG). This is consistent with a predictive coding perspective on hierarchical Bayesian inference in the cortex where the domain of the prediction error (phonological vs. semantic) determines its regional expression (middle temporal gyrus/STS vs. AG/intraparietal sulcus).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose: This study investigated the impact of simulated hyperopic anisometropia and sustained near work on performance of academic-related measures in children. Methods: Participants included 16 children (mean age: 11.1 ± 0.8 years) with minimal refractive error. Academic-related outcome measures included a reading test (Neale Analysis of Reading Ability), visual information processing tests (Coding and Symbol Search subtests from the Wechsler Intelligence Scale for Children) and a reading-related eye movement test (Developmental Eye Movement test). Performance was assessed with and without 0.75 D of imposed monocular hyperopic defocus (administered in a randomised order), before and after 20 minutes of sustained near work. Unilateral hyperopic defocus was systematically assigned to either the dominant or non-dominant sighting eye to evaluate the impact of ocular dominance on any performance decrements. Results: Simulated hyperopic anisometropia and sustained near work both independently reduced performance on all of the outcome measures (p<0.001). A significant interaction was also observed between simulated anisometropia and near work (p<0.05), with the greatest decrement in performance observed during simulated anisometropia in combination with sustained near work. Laterality of the refractive error simulation (ocular dominance) did not significantly influence the outcome measures (p>0.05). A reduction of up to 12% in performance was observed across the range of academic-related measures following sustained near work undertaken during the anisometropic simulation. Conclusion: Simulated hyperopic anisometropia significantly impaired academic–related performance, particularly in combination with sustained near work. The impact of uncorrected habitual anisometropia on academic-related performance in children requires further investigation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spoken term detection (STD) is the task of looking up a spoken term in a large volume of speech segments. In order to provide fast search, speech segments are first indexed into an intermediate representation using speech recognition engines which provide multiple hypotheses for each speech segment. Approximate matching techniques are usually applied at the search stage to compensate the poor performance of automatic speech recognition engines during indexing. Recently, using visual information in addition to audio information has been shown to improve phone recognition performance, particularly in noisy environments. In this paper, we will make use of visual information in the form of lip movements of the speaker in indexing stage and will investigate its effect on STD performance. Particularly, we will investigate if gains in phone recognition accuracy will carry through the approximate matching stage to provide similar gains in the final audio-visual STD system over a traditional audio only approach. We will also investigate the effect of using visual information on STD performance in different noise environments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Rapid Visual Information Processing (RVIP) task, a serial discrimination task where task performance believed to reflect sustained attention capabilities, is widely used in behavioural research and increasingly in neuroimaging studies. To date, functional neuroimaging research into the RVIP has been undertaken using block analyses, reflecting the sustained processing involved in the task, but not necessarily the transient processes associated with individual trial performance. Furthermore, this research has been limited to young cohorts. This study assessed the behavioural and functional magnetic resonance imaging (fMRI) outcomes of the RVIP task using both block and event-related analyses in a healthy middle aged cohort (mean age = 53.56 years, n = 16). The results show that the version of the RVIP used here is sensitive to changes in attentional demand processes with participants achieving a 43% accuracy hit rate in the experimental task compared with 96% accuracy in the control task. As shown by previous research, the block analysis revealed an increase in activation in a network of frontal, parietal, occipital and cerebellar regions. The event related analysis showed a similar network of activation, seemingly omitting regions involved in the processing of the task (as shown in the block analysis), such as occipital areas and the thalamus, providing an indication of a network of regions involved in correct trial performance. Frontal (superior and inferior frontal gryi), parietal (precuenus, inferior parietal lobe) and cerebellar regions were shown to be active in both the block and event-related analyses, suggesting their importance in sustained attention/vigilance. These networks and the differences between them are discussed in detail, as well as implications for future research in middle aged cohorts.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Whilst a variety of studies has appeared over the last decade addressing the gap between the potential promised by computers and the reality experienced in the classroom by teachers and students, few have specifically addressed the situation as it pertains to the visual arts classroom. The aim of this study was to explore the reality of the classroom use of computers for three visual arts highschool teachers and determine how computer technology might enrich visual arts teaching and learning. An action research approach was employed to enable the researcher to understand the situation from the teachers' points of view while contributing to their professional practice. The wider social context surrounding this study is characterised by an increase in visual communications brought about by rapid advances in computer technology. The powerful combination of visual imagery and computer technology is illustrated by continuing developments in the print, film and television industries. In particular, the recent growth of interactive multimedia epitomises this combination and is significant to this study as it represents a new form of publishing of great interest to educators and artists alike. In this social context, visual arts education has a significant role to play. By cultivating a critical awareness of the implications of technology use and promoting a creative approach to the application of computer technology within the visual arts, visual arts education is in a position to provide an essential service to students who will leave high school to participate in a visual information age as both consumers and producers.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming two independent data streams. Recent work with multi-modal MSHMMs has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously (T.J. Wark et al., 1998), however this has been restricted to output fusion via single-stream HMMs. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Visual activity detection of lip movements can be used to overcome the poor performance of voice activity detection based solely in the audio domain, particularly in noisy acoustic conditions. However, most of the research conducted in visual voice activity detection (VVAD) has neglected addressing variabilities in the visual domain such as viewpoint variation. In this paper we investigate the effectiveness of the visual information from the speaker’s frontal and profile views (i.e left and right side views) for the task of VVAD. As far as we are aware, our work constitutes the first real attempt to study this problem. We describe our visual front end approach and the Gaussian mixture model (GMM) based VVAD framework, and report the experimental results using the freely available CUAVE database. The experimental results show that VVAD is indeed possible from profile views and we give a quantitative comparison of VVAD based on frontal and profile views The results presented are useful in the development of multi-modal Human Machine Interaction (HMI) using a single camera, where the speaker’s face may not always be frontal.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This article presents a visual servoing system to follow a 3D moving object by a Micro Unmanned Aerial Vehicle (MUAV). The presented control strategy is based only on the visual information given by an adaptive tracking method based on the colour information. A visual fuzzy system has been developed for servoing the camera situated on a rotary wing MAUV, that also considers its own dynamics. This system is focused on continuously following of an aerial moving target object, maintaining it with a fixed safe distance and centred on the image plane. The algorithm is validated on real flights on outdoors scenarios, showing the robustness of the proposed systems against winds perturbations, illumination and weather changes among others. The obtained results indicate that the proposed algorithms is suitable for complex controls task, such object following and pursuit, flying in formation, as well as their use for indoor navigation