316 resultados para Visual servoing
Resumo:
Visual noise insensitivity is important to audio visual speech recognition (AVSR). Visual noise can take on a number of forms such as varying frame rate, occlusion, lighting or speaker variabilities. The use of a high dimensional secondary classifier on the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Preliminary results are presented demonstrating performance above the catastrophic fusion boundary for our confidence measure irrespective of the type of visual noise presented to it. Our experiments were restricted to small vocabulary applications.
Resumo:
The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-channel acoustic speech enhancement techniques. We also show that further improvement in recognition performance can be obtained by fusing speech-enhanced audio with the visual modality, demonstrating the complementary nature of the two robust speech recognition approaches.
Resumo:
Micro aerial vehicles (MAVs) are a rapidly growing area of research and development in robotics. For autonomous robot operations, localization has typically been calculated using GPS, external camera arrays, or onboard range or vision sensing. In cluttered indoor or outdoor environments, onboard sensing is the only viable option. In this paper we present an appearance-based approach to visual SLAM on a flying MAV using only low quality vision. Our approach consists of a visual place recognition algorithm that operates on 1000 pixel images, a lightweight visual odometry algorithm, and a visual expectation algorithm that improves the recall of place sequences and the precision with which they are recalled as the robot flies along a similar path. Using data gathered from outdoor datasets, we show that the system is able to perform visual recognition with low quality, intermittent visual sensory data. By combining the visual algorithms with the RatSLAM system, we also demonstrate how the algorithms enable successful SLAM.
Resumo:
Diabetes is an increasingly prevalent disease worldwide. Providing early management of the complications can prevent morbidity and mortality in this population. Peripheral neuropathy, a significant complication of diabetes, is the major cause of foot ulceration and amputation in diabetes. Delay in attending to complication of the disease contributes to significant medical expenses for diabetic patients and the community. Early structural changes to the neural components of the retina have been demonstrated to occur prior to the clinically visible retinal vasculature complication of diabetic retinopathy. Additionally visual functionloss has been shown to exist before the ophthalmoscopic manifestations of vasculature damage. The purpose of this thesis was to evaluate the relationship between diabetic peripheral neuropathy and both retinal structure and visual function. The key question was whether diabetic peripheral neuropathy is the potential underlying factor responsible for retinal anatomical change and visual functional loss in people with diabetes. This study was conducted on a cohort with type 2 diabetes. Retinal nerve fibre layer thickness was assessed by means of Optical Coherence Tomography (OCT). Visual function was assessed using two different methods; Standard Automated Perimetry (SAP) and flicker perimetry were performed within the central 30 degrees of fixation. The level of diabetic peripheral neuropathy (DPN) was assessed using two techniques - Quantitative Sensory Testing and Neuropathy Disability Score (NDS). These techniques are known to be capable of detecting DPN at very early stages. NDS has also been shown as a gold standard for detecting 'risk of foot ulceration'. Findings reported in this thesis showed that RNFL thickness, particularly in the inferior quadrant, has a significant association with severity of DPN when the condition has been assessed using NDS. More specifically it was observed that inferior RNFL thickness has the ability to differentiate individuals who are at higher risk of foot ulceration from those who are at lower risk, indicating that RNFL thickness can predict late-staged DPN. Investigating the association between RNFL and QST did not show any meaningful interaction, which indicates that RNFL thickness for this cohort was not as predictive of neuropathy status as NDS. In both of these studies, control participants did not have different results from the type 2 cohort who did not DPN suggesting that RNFL thickness is not a marker for diagnosing DPN at early stages. The latter finding also indicated that diabetes per se, is unlikely to affect the RNFL thickness. Visual function as measured by SAP and flicker perimetry was found to be associated with severity of peripheral neuropathy as measured by NDS. These findings were also capable of differentiating individuals at higher risk of foot ulceration; however, visual function also proved not to be a maker for early diagnosis of DPN. It was found that neither SAP, nor flicker sensitivity have meaningful associations with DPN when neuropathy status was measured using QST. Importantly diabetic retinopathy did not explain any of the findings in these experiments. The work described here is valuable as no other research to date has investigated the association between diabetic peripheral neuropathy and either retinal structure or visual function.
Resumo:
Visual activity detection of lip movements can be used to overcome the poor performance of voice activity detection based solely in the audio domain, particularly in noisy acoustic conditions. However, most of the research conducted in visual voice activity detection (VVAD) has neglected addressing variabilities in the visual domain such as viewpoint variation. In this paper we investigate the effectiveness of the visual information from the speaker’s frontal and profile views (i.e left and right side views) for the task of VVAD. As far as we are aware, our work constitutes the first real attempt to study this problem. We describe our visual front end approach and the Gaussian mixture model (GMM) based VVAD framework, and report the experimental results using the freely available CUAVE database. The experimental results show that VVAD is indeed possible from profile views and we give a quantitative comparison of VVAD based on frontal and profile views The results presented are useful in the development of multi-modal Human Machine Interaction (HMI) using a single camera, where the speaker’s face may not always be frontal.
Resumo:
In this paper, we present a method for the recovery of position and absolute attitude (including pitch, roll and yaw) using a novel fusion of monocular Visual Odometry and GPS measurements in a similar manner to a classic loosely-coupled GPS/INS error state navigation filter. The proposed filter does not require additional restrictions or assumptions such as platform-specific dynamics, map-matching, feature-tracking, visual loop-closing, gravity vector or additional sensors such as an IMU or magnetic compass. An observability analysis of the proposed filter is performed, showing that the scale factor, position and attitude errors are fully observable under acceleration that is non-parallel to velocity vector in the navigation frame. The observability properties of the proposed filter are demonstrated using numerical simulations. We conclude the article with an implementation of the proposed filter using real flight data collected from a Cessna 172 equipped with a downwards-looking camera and GPS, showing the feasibility of the algorithm in real-world conditions.
Resumo:
We modified a commercial Hartmann-Shack aberrometer and used it to measure ocular aberrations across the central 42º horizontal x 32º vertical visual fields of five young emmetropic subjects. Some Zernike aberration coefficients show coefficient field distributions that were similar to the field dependence predicted by Seidel theory (astigmatism, oblique astigmatism, horizontal coma, vertical coma), but defocus did not demonstrate such similarity.
Resumo:
To address issues of divisive ideologies in the Mathematics Education community and to subsequently advance educational practice, an alternative theoretical framework and operational model is proposed which represents a consilience of constructivist learning theories whilst acknowledging the objective but improvable nature of domain knowledge. Based upon Popper’s three-world model of knowledge, the proposed theory supports the differentiation and explicit modelling of both shared domain knowledge and idiosyncratic personal understanding using a visual nomenclature. The visual nomenclature embodies Piaget’s notion of reflective abstraction and so may support an individual’s experience-based transformation of personal understanding with regards to shared domain knowledge. Using the operational model and visual nomenclature, seminal literature regarding early-number counting and addition was analysed and described. Exemplars of the resultant visual artefacts demonstrate the proposed theory’s viability as a tool with which to characterise the reflective abstraction-based organisation of a domain’s shared knowledge. Utilising such a description of knowledge, future research needs to consider the refinement of the operational model and visual nomenclature to include the analysis, description and scaffolded transformation of personal understanding. A detailed model of knowledge and understanding may then underpin the future development of educational software tools such as computer-mediated teaching and learning environments.
Resumo:
Early-number is a rich fabric of interconnected ideas that is often misunderstood and thus taught in ways that do not lead to rich understanding. In this presentation, a visual language is used to describe the organisation of this domain of knowledge. This visual language is based upon Piaget’s notion of reflective abstraction (Dubinsky, 1991; Piaget, 1977/2001), and thus captures the epistemological associations that link the problems, concepts and representations of the domain. The constructs of this visual language are introduced and then applied to the early-number domain. The introduction to this visual language may prompt reflection upon its suitability and significance to the description of other domains of knowledge. Through such a process of analysis and description, the visual language may serve as a scaffold for enhancing pedagogical content knowledge and thus ultimately improve learning outcomes.
Resumo:
In this chapter, Felicity McArdle provides a framework for planning,implementing and assessing quality experiences in the visual arts. She describes the importance of embedding Indigenous perspectives and knowledge in the curriculum and how to a build a repertoire of resources and practical ideas to assist children to become eff ective communicators through the use of symbol systems for meaning-making.
Resumo:
Objective The current study evaluated part of the Multifactorial Model of Driving Safety to elucidate the relative importance of cognitive function and a limited range of standard measures of visual function in relation to the Capacity to Drive Safely. Capacity to Drive Safely was operationalized using three validated screening measures for older drivers. These included an adaptation of the well validated Useful Field of View (UFOV) and two newer measures, namely a Hazard Perception Test (HPT), and a Hazard Change Detection Task (HCDT). Method Community dwelling drivers (n = 297) aged 65–96 were assessed using a battery of measures of cognitive and visual function. Results Factor analysis of these predictor variables yielded factors including Executive/Speed, Vision (measured by visual acuity and contrast sensitivity), Spatial, Visual Closure, and Working Memory. Cognitive and Vision factors explained 83–95% of age-related variance in the Capacity to Drive Safely. Spatial and Working Memory were associated with UFOV, HPT and HCDT, Executive/Speed was associated with UFOV and HCDT and Vision was associated with HPT. Conclusion The Capacity to Drive Safely declines with chronological age, and this decline is associated with age-related declines in several higher order cognitive abilities involving manipulation and storage of visuospatial information under speeded conditions. There are also age-independent effects of cognitive function and vision that determine driving safety.
Resumo:
In this paper we present a novel algorithm for localization during navigation that performs matching over local image sequences. Instead of calculating the single location most likely to correspond to a current visual scene, the approach finds candidate matching locations within every section (subroute) of all learned routes. Through this approach, we reduce the demands upon the image processing front-end, requiring it to only be able to correctly pick the best matching image from within a short local image sequence, rather than globally. We applied this algorithm to a challenging downhill mountain biking visual dataset where there was significant perceptual or environment change between repeated traverses of the environment, and compared performance to applying the feature-based algorithm FAB-MAP. The results demonstrate the potential for localization using visual sequences, even when there are no visual features that can be reliably detected.
Resumo:
In this paper, we present a new algorithm for boosting visual template recall performance through a process of visual expectation. Visual expectation dynamically modifies the recognition thresholds of learnt visual templates based on recently matched templates, improving the recall of sequences of familiar places while keeping precision high, without any feedback from a mapping backend. We demonstrate the performance benefits of visual expectation using two 17 kilometer datasets gathered in an outdoor environment at two times separated by three weeks. The visual expectation algorithm provides up to a 100% improvement in recall. We also combine the visual expectation algorithm with the RatSLAM SLAM system and show how the algorithm enables successful mapping
Resumo:
Power relations and small and medium-sized enterprise strategies for capturing value in global production networks: visual effects (VFX) service firms in the Hollywood film industry, Regional Studies. This paper provides insights into the way in which non-lead firms manoeuvre in global value chains in the pursuit of a larger share of revenue and how power relations affect these manoeuvres. It examines the nature of value capture and power relations in the global supply of visual effects (VFX) services and the range of strategies VFX firms adopt to capture higher value in the global value chain. The analysis is based on a total of thirty-six interviews with informants in the industry in Australia, the United Kingdom and Canada, and a database of VFX credits for 3323 visual products for 640 VFX firms.