316 resultados para visual servoing
Resumo:
Achieving a robust, accurately scaled pose estimate in long-range stereo presents significant challenges. For large scene depths, triangulation from a single stereo pair is inadequate and noisy. Additionally, vibration and flexible rigs in airborne applications mean accurate calibrations are often compromised. This paper presents a technique for accurately initializing a long-range stereo VO algorithm at large scene depth, with accurate scale, without explicitly computing structure from rigidly fixed camera pairs. By performing a monocular pose estimate over a window of frames from a single camera, followed by adding the secondary camera frames in a modified bundle adjustment, an accurate, metrically scaled pose estimate can be found. To achieve this the scale of the stereo pair is included in the optimization as an additional parameter. Results are presented both on simulated and field gathered data from a fixed-wing UAV flying at significant altitude, where the epipolar geometry is inaccurate due to structural deformation and triangulation from a single pair is insufficient. Comparisons are made with more conventional VO techniques where the scale is not explicitly optimized, and demonstrated over repeated trials to indicate robustness.
Resumo:
The article presents a study which investigated the reasons why advice related to the removal of mats or rags by older people with visual impairments had a low rate of acceptance. The researchers speculated that it may have been due to older people's need to maintain a sense of control and autonomy and to arrange their environments in a way that they decided or a belief that the recommended modification would not reduce the risk of falling. A telephone survey of subsample of the participants was conducted in the Visually Impaired Persons (VIP) Trial. All 30 interviewees had rugs or mats in their homes. Of the 30 participants, 20 had moved the rugs or mats as a result of recommendations, and 10 had not.
Resumo:
The performance of visual speech recognition (VSR) systems are significantly influenced by the accuracy of the visual front-end. The current state-of-the-art VSR systems use off-the-shelf face detectors such as Viola- Jones (VJ) which has limited reliability for changes in illumination and head poses. For a VSR system to perform well under these conditions, an accurate visual front end is required. This is an important problem to be solved in many practical implementations of audio visual speech recognition systems, for example in automotive environments for an efficient human-vehicle computer interface. In this paper, we re-examine the current state-of-the-art VSR by comparing off-the-shelf face detectors with the recently developed Fourier Lucas-Kanade (FLK) image alignment technique. A variety of image alignment and visual speech recognition experiments are performed on a clean dataset as well as with a challenging automotive audio-visual speech dataset. Our results indicate that the FLK image alignment technique can significantly outperform off-the shelf face detectors, but requires frequent fine-tuning.
Resumo:
Background Standard operating procedures state that police officers should not drive while interacting with their mobile data terminal (MDT) which provides in-vehicle information essential to police work. Such interactions do however occur in practice and represent a potential source of driver distraction. The MDT comprises visual output with manual input via touch screen and keyboard. This study investigated the potential for alternative input and output methods to mitigate driver distraction with specific focus on eye movements. Method Nineteen experienced drivers of police vehicles (one female) from the NSW Police Force completed four simulated urban drives. Three drives included a concurrent secondary task: imitation licence plate search using an emulated MDT. Three different interface methods were examined: Visual-Manual, Visual-Voice, and Audio-Voice (“Visual” and “Audio” = output modality; “Manual” and “Voice” = input modality). During each drive, eye movements were recorded using FaceLAB™ (Seeing Machines Ltd, Canberra, ACT). Gaze direction and glances on the MDT were assessed. Results The Visual-Voice and Visual-Manual interfaces resulted in a significantly greater number of glances towards the MDT than Audio-Voice or Baseline. The Visual-Manual and Visual-Voice interfaces resulted in significantly more glances to the display than Audio-Voice or Baseline. For longer duration glances (>2s and 1-2s) the Visual-Manual interface resulted in significantly more fixations than Baseline or Audio-Voice. The short duration glances (<1s) were significantly greater for both Visual-Voice and Visual-Manual compared with Baseline and Audio-Voice. There were no significant differences between Baseline and Audio-Voice. Conclusion An Audio-Voice interface has the greatest potential to decrease visual distraction to police drivers. However, it is acknowledged that an audio output may have limitations for information presentation compared with visual output. The Visual-Voice interface offers an environment where the capacity to present information is sustained, whilst distraction to the driver is reduced (compared to Visual-Manual) by enabling adaptation of fixation behaviour.
Resumo:
The appropriateness of applying drink driving legislation to motorcycle riding has been questioned as there may be fundamental differences in the effects of alcohol on driving and motorcycling. It has been suggested that alcohol may redirect riders’ focus from higher-order cognitive skills such as cornering, judgement and hazard perception, to more physical skills such as maintaining balance. To test this hypothesis, the effects of low doses of alcohol on balance ability were investigated in a laboratory setting. The static balance of twenty experienced and twenty novice riders was measured while they performed either no secondary task, a visual (search) task, or a cognitive (arithmetic) task following the administration of alcohol (0%, 0.02%, and 0.05% BAC). Subjective ratings of intoxication and balance impairment increased in a dose-dependent manner in both novice and experienced motorcycle riders, while a BAC of 0.05%, but not 0.02%, was associated with impairments in static balance ability. This balance impairment was exacerbated when riders performed a cognitive, but not a visual, secondary task. Likewise, 0.05% BAC was associated with impairments in novice and experienced riders’ performance of a cognitive, but not a visual, secondary task, suggesting that interactive processes underlie balance and cognitive task performance. There were no observed differences between novice vs. experienced riders on static balance and secondary task performance, either alone or in combination. Implications for road safety and future ‘drink riding’ policy considerations are discussed.
Resumo:
Maternally inherited diabetes and deafness (MIDD) is an autosomal dominant inherited syndrome caused by the mitochondrial DNA (mtDNA) nucleotide mutation A3243G. It affects various organs including the eye with external ophthalmoparesis, ptosis, and bilateral macular pattern dystrophy.1, 2 The prevalence of retinal involvement in MIDD is high, with 50% to 85% of patients exhibiting some macular changes.1 Those changes, however, can vary between patients and within families dramatically based on the percentage of retinal mtDNA mutations, making it difficult to give predictions on an individual’s visual prognosis...
Resumo:
Purpose To design and manufacture lenses to correct peripheral refraction along the horizontal meridian and to determine whether these resulted in noticeable improvements in visual performance. Method Subjective refraction of a low myope was determined on the basis of best peripheral detection acuity along the horizontal visual field out to ±30° for both horizontal and vertical gratings. Subjective refraction was compared to objective refractions using a COAS-HD aberrometer. Special lenses were made to correct peripheral refraction, based on designs optimized with and without smoothing across a 3 mm diameter square aperture. Grating detection was retested with these lenses. Contrast thresholds of 1.25’ spots were determined across the field for the conditions of best correction, on-axis correction, and the special lenses. Results The participant had high relative peripheral hyperopia, particularly in the temporal visual field (maximum 2.9 D). There were differences > 0.5D between subjective and objective refractions at a few field angles. On-axis correction reduced peripheral detection acuity and increased peripheral contrast threshold in the peripheral visual field, relative to the best correction, by up to 0.4 and 0.5 log units, respectively. The special lenses restored most of the peripheral vision, although not all at angles to ±10°, and with the lens optimized with aperture-smoothing possibly giving better vision than the lens optimized without aperture-smoothing at some angles. Conclusion It is possible to design and manufacture lenses to give near optimum peripheral visual performance to at least ±30° along one visual field meridian. The benefit of such lenses is likely to be manifest only if a subject has a considerable relative peripheral refraction, for example of the order of 2 D.
Resumo:
In this paper, we present SMART (Sequence Matching Across Route Traversals): a vision- based place recognition system that uses whole image matching techniques and odometry information to improve the precision-recall performance, latency and general applicability of the SeqSLAM algorithm. We evaluate the system’s performance on challenging day and night journeys over several kilometres at widely varying vehicle velocities from 0 to 60 km/h, compare performance to the current state-of- the-art SeqSLAM algorithm, and provide parameter studies that evaluate the effectiveness of each system component. Using 30-metre sequences, SMART achieves place recognition performance of 81% recall at 100% precision, outperforming SeqSLAM, and is robust to significant degradations in odometry.
Resumo:
The ability to automate forced landings in an emergency such as engine failure is an essential ability to improve the safety of Unmanned Aerial Vehicles operating in General Aviation airspace. By using active vision to detect safe landing zones below the aircraft, the reliability and safety of such systems is vastly improved by gathering up-to-the-minute information about the ground environment. This paper presents the Site Detection System, a methodology utilising a downward facing camera to analyse the ground environment in both 2D and 3D, detect safe landing sites and characterise them according to size, shape, slope and nearby obstacles. A methodology is presented showing the fusion of landing site detection from 2D imagery with a coarse Digital Elevation Map and dense 3D reconstructions using INS-aided Structure-from-Motion to improve accuracy. Results are presented from an experimental flight showing the precision/recall of landing sites in comparison to a hand-classified ground truth, and improved performance with the integration of 3D analysis from visual Structure-from-Motion.
Resumo:
This paper presents a long-term experiment where a mobile robot uses adaptive spherical views to localize itself and navigate inside a non-stationary office environment. The office contains seven members of staff and experiences a continuous change in its appearance over time due to their daily activities. The experiment runs as an episodic navigation task in the office over a period of eight weeks. The spherical views are stored in the nodes of a pose graph and they are updated in response to the changes in the environment. The updating mechanism is inspired by the concepts of long- and short-term memories. The experimental evaluation is done using three performance metrics which evaluate the quality of both the adaptive spherical views and the navigation over time.
Resumo:
Purpose: Changes in pupil size and shape are relevant for peripheral imagery by affecting aberrations and how much light enters and/or exits the eye. The purpose of this study is to model the pattern of pupil shape across the complete horizontal visual field and to show how the pattern is influenced by refractive error. Methods: Right eyes of thirty participants were dilated with 1% cyclopentolate and images were captured using a modified COAS-HD aberrometer alignment camera along the horizontal visual field to ±90°. A two lens relay system enabled fixation at targets mounted on the wall 3m from the eye. Participants placed their heads on a rotatable chin rest and eye rotations were kept to less than 30°. Best-fit elliptical dimensions of pupils were determined. Ratios of minimum to maximum axis diameters were plotted against visual field angle. Results: Participants’ data were well fitted by cosine functions, with maxima at (–)1° to (–)9° in the temporal visual field and widths 9% to 15% greater than predicted by the cosine of the field angle . Mean functions were 0.99cos[( + 5.3)/1.121], R2 0.99 for the whole group and 0.99cos[( + 6.2)/1.126], R2 0.99 for the 13 emmetropes. The function peak became less temporal, and the width became smaller, with increase in myopia. Conclusion: Off-axis pupil shape changes are well described by a cosine function which is both decentered by a few degrees and flatter by about 12% than the cosine of the viewing angle, with minor influences of refraction.
Resumo:
In this paper we present a novel place recognition algorithm inspired by recent discoveries in human visual neuroscience. The algorithm combines intolerant but fast low resolution whole image matching with highly tolerant, sub-image patch matching processes. The approach does not require prior training and works on single images (although we use a cohort normalization score to exploit temporal frame information), alleviating the need for either a velocity signal or image sequence, differentiating it from current state of the art methods. We demonstrate the algorithm on the challenging Alderley sunny day – rainy night dataset, which has only been previously solved by integrating over 320 frame long image sequences. The system is able to achieve 21.24% recall at 100% precision, matching drastically different day and night-time images of places while successfully rejecting match hypotheses between highly aliased images of different places. The results provide a new benchmark for single image, condition-invariant place recognition.
Resumo:
Ongoing innovation in digital animation and visual effects technologies has provided new opportunities for stories to be visually rendered in ways never before possible. Films featuring animation and visual effects continue to perform well at the box office, proving to be highly profitable projects. The Avengers (Whedon, 2012) holds the current record for opening weekend sales, accruing as much as $207,438,708 USD and $623,357,910 USD gross at time of writing. Life of Pi (Lee, 2012) at time of writing has grossed as much as $608,791,063 USD (Box Office Mojo, 2013). With so much creative potential and a demonstrable ability to generate a large amount of revenue, the animation and visual effects industry – otherwise known as the Post, Digital and Visual Effects (PDV) industry – has become significant to the future growth and stability of the Australian film industry as a whole.
Resumo:
Long-term autonomy in robotics requires perception systems that are resilient to unusual but realistic conditions that will eventually occur during extended missions. For example, unmanned ground vehicles (UGVs) need to be capable of operating safely in adverse and low-visibility conditions, such as at night or in the presence of smoke. The key to a resilient UGV perception system lies in the use of multiple sensor modalities, e.g., operating at different frequencies of the electromagnetic spectrum, to compensate for the limitations of a single sensor type. In this paper, visual and infrared imaging are combined in a Visual-SLAM algorithm to achieve localization. We propose to evaluate the quality of data provided by each sensor modality prior to data combination. This evaluation is used to discard low-quality data, i.e., data most likely to induce large localization errors. In this way, perceptual failures are anticipated and mitigated. An extensive experimental evaluation is conducted on data sets collected with a UGV in a range of environments and adverse conditions, including the presence of smoke (obstructing the visual camera), fire, extreme heat (saturating the infrared camera), low-light conditions (dusk), and at night with sudden variations of artificial light. A total of 240 trajectory estimates are obtained using five different variations of data sources and data combination strategies in the localization method. In particular, the proposed approach for selective data combination is compared to methods using a single sensor type or combining both modalities without preselection. We show that the proposed framework allows for camera-based localization resilient to a large range of low-visibility conditions.
Resumo:
This work aims to contribute to the reliability and integrity of perceptual systems of unmanned ground vehicles (UGV). A method is proposed to evaluate the quality of sensor data prior to its use in a perception system by utilising a quality metric applied to heterogeneous sensor data such as visual and infrared camera images. The concept is illustrated specifically with sensor data that is evaluated prior to the use of the data in a standard SIFT feature extraction and matching technique. The method is then evaluated using various experimental data sets that were collected from a UGV in challenging environmental conditions, represented by the presence of airborne dust and smoke. In the first series of experiments, a motionless vehicle is observing a ’reference’ scene, then the method is extended to the case of a moving vehicle by compensating for its motion. This paper shows that it is possible to anticipate degradation of a perception algorithm by evaluating the input data prior to any actual execution of the algorithm.