371 resultados para visual words


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Localisation of an AUV is challenging and a range of inspection applications require relatively accurate positioning information with respect to submerged structures. We have developed a vision based localisation method that uses a 3D model of the structure to be inspected. The system comprises a monocular vision system, a spotlight and a low-cost IMU. Previous methods that attempt to solve the problem in a similar way try and factor out the effects of lighting. Effects, such as shading on curved surfaces or specular reflections, are heavily dependent on the light direction and are difficult to deal with when using existing techniques. The novelty of our method is that we explicitly model the light source. Results are shown of an implementation on a small AUV in clear water at night.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Probabilistic robotics, most often applied to the problem of simultaneous localisation and mapping (SLAM), requires measures of uncertainly to accompany observations of the environment. This paper describes how uncertainly can be characterised for a vision system that locates coloured landmark in a typical laboratory environment. The paper describes a model of the uncertainly in segmentation, the internal camera model and the mounting of the camera on the robot. It =plains the implementation of the system on a laboratory robot, and provides experimental results that show the coherence of the uncertainly model,

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis addresses the problem of detecting and describing the same scene points in different wide-angle images taken by the same camera at different viewpoints. This is a core competency of many vision-based localisation tasks including visual odometry and visual place recognition. Wide-angle cameras have a large field of view that can exceed a full hemisphere, and the images they produce contain severe radial distortion. When compared to traditional narrow field of view perspective cameras, more accurate estimates of camera egomotion can be found using the images obtained with wide-angle cameras. The ability to accurately estimate camera egomotion is a fundamental primitive of visual odometry, and this is one of the reasons for the increased popularity in the use of wide-angle cameras for this task. Their large field of view also enables them to capture images of the same regions in a scene taken at very different viewpoints, and this makes them suited for visual place recognition. However, the ability to estimate the camera egomotion and recognise the same scene in two different images is dependent on the ability to reliably detect and describe the same scene points, or ‘keypoints’, in the images. Most algorithms used for this purpose are designed almost exclusively for perspective images. Applying algorithms designed for perspective images directly to wide-angle images is problematic as no account is made for the image distortion. The primary contribution of this thesis is the development of two novel keypoint detectors, and a method of keypoint description, designed for wide-angle images. Both reformulate the Scale- Invariant Feature Transform (SIFT) as an image processing operation on the sphere. As the image captured by any central projection wide-angle camera can be mapped to the sphere, applying these variants to an image on the sphere enables keypoints to be detected in a manner that is invariant to image distortion. Each of the variants is required to find the scale-space representation of an image on the sphere, and they differ in the approaches they used to do this. Extensive experiments using real and synthetically generated wide-angle images are used to validate the two new keypoint detectors and the method of keypoint description. The best of these two new keypoint detectors is applied to vision based localisation tasks including visual odometry and visual place recognition using outdoor wide-angle image sequences. As part of this work, the effect of keypoint coordinate selection on the accuracy of egomotion estimates using the Direct Linear Transform (DLT) is investigated, and a simple weighting scheme is proposed which attempts to account for the uncertainty of keypoint positions during detection. A word reliability metric is also developed for use within a visual ‘bag of words’ approach to place recognition.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

My research investigates why nouns are learned disproportionately more frequently than other kinds of words during early language acquisition (Gentner, 1982; Gleitman, et al., 2004). This question must be considered in the context of cognitive development in general. Infants have two major streams of environmental information to make meaningful: perceptual and linguistic. Perceptual information flows in from the senses and is processed into symbolic representations by the primitive language of thought (Fodor, 1975). These symbolic representations are then linked to linguistic input to enable language comprehension and ultimately production. Yet, how exactly does perceptual information become conceptualized? Although this question is difficult, there has been progress. One way that children might have an easier job is if they have structures that simplify the data. Thus, if particular sorts of perceptual information could be separated from the mass of input, then it would be easier for children to refer to those specific things when learning words (Spelke, 1990; Pylyshyn, 2003). It would be easier still, if linguistic input was segmented in predictable ways (Gentner, 1982; Gleitman, et al., 2004) Unfortunately the frequency of patterns in lexical or grammatical input cannot explain the cross-cultural and cross-linguistic tendency to favor nouns over verbs and predicates. There are three examples of this failure: 1) a wide variety of nouns are uttered less frequently than a smaller number of verbs and yet are learnt far more easily (Gentner, 1982); 2) word order and morphological transparency offer no insight when you contrast the sentence structures and word inflections of different languages (Slobin, 1973) and 3) particular language teaching behaviors (e.g. pointing at objects and repeating names for them) have little impact on children's tendency to prefer concrete nouns in their first fifty words (Newport, et al., 1977). Although the linguistic solution appears problematic, there has been increasing evidence that the early visual system does indeed segment perceptual information in specific ways before the conscious mind begins to intervene (Pylyshyn, 2003). I argue that nouns are easier to learn because their referents directly connect with innate features of the perceptual faculty. This hypothesis stems from work done on visual indexes by Zenon Pylyshyn (2001, 2003). Pylyshyn argues that the early visual system (the architecture of the "vision module") segments perceptual data into pre-conceptual proto-objects called FINSTs. FINSTs typically correspond to physical things such as Spelke objects (Spelke, 1990). Hence, before conceptualization, visual objects are picked out by the perceptual system demonstratively, like a finger pointing indicating ‘this’ or ‘that’. I suggest that this primitive system of demonstration elaborates on Gareth Evan's (1982) theory of nonconceptual content. Nouns are learnt first because their referents attract demonstrative visual indexes. This theory also explains why infants less often name stationary objects such as plate or table, but do name things that attract the focal attention of the early visual system, i.e., small objects that move, such as ‘dog’ or ‘ball’. This view leaves open the question how blind children learn words for visible objects and why children learn category nouns (e.g. 'dog'), rather than proper nouns (e.g. 'Fido') or higher taxonomic distinctions (e.g. 'animal').

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The cascading appearance-based (CAB) feature extraction technique has established itself as the state-of-the-art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we will demonstrate that the same steps taken to reduce static speaker and environmental information for the visual speech recognition application also provide similar improvements for visual speaker recognition. A further study is conducted comparing synchronous HMM (SHMM) based fusion of CAB visual features and traditional perceptual linear predictive (PLP) acoustic features to show that higher complexity inherit in the SHMM approach does not appear to provide any improvement in the final audio-visual speaker verification system over simpler utterance level score fusion.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Thomas Young (1773-1829) carried out major pioneering work in many different subjects. In 1800 he gave the Bakerian Lecture of the Royal Society on the topic of the “mechanism of the eye”: this was published in the following year (Young, 1801). Young used his own design of optometer to measure refraction and accommodation, and discovered his own astigmatism. He considered the different possible origins of accommodation and confirmed that it was due to change in shape of the lens rather than to change in shape of the cornea or an increase in axial length. However, the paper also dealt with many other aspects of visual and ophthalmic optics, such as biometric parameters, peripheral refraction, longitudinal chromatic aberration, depth-of-focus and instrument myopia. These aspects of the paper have previously received little attention. We now give detailed consideration to these and other less-familiar features of Young’s work and conclude that his studies remain relevant to many of the topics which currently engage visual scientists.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose: Flickering stimuli increase the metabolic demand of the retina,making it a sensitive perimetric stimulus to the early onset of retinal disease. We determine whether flickering stimuli are a sensitive indicator of vision deficits resulting from to acute, mild systemic hypoxia when compared to standard static perimetry. Methods: Static and flicker visual perimetry were performed in 14 healthy young participants while breathing 12% oxygen (hypoxia) under photopic illumination. The hypoxia visual field data were compared with the field data measured during normoxia. Absolute sensitivities (in dB) were analysed in seven concentric rings at 1°, 3°, 6°, 10°, 15°, 22° and 30° eccentricities as well as mean defect (MD) and pattern defect (PD) were calculated. Preliminary data are reported for mesopic light levels. Results: Under photopic illumination, flicker and static visual field sensitivities at all eccentricities were not significantly different between hypoxia and normoxia conditions. The mean defect and pattern defect were not significantly different for either test between the two oxygenation conditions. Conclusion: Although flicker stimulation increases cellular metabolism, flicker photopic visual field impairment is not detected during mild hypoxia. These findings contrast with electrophysiological flicker tests in young participants that show impairment at photopic illumination during the same levels of mild hypoxia. Potential mechanisms contributing to the difference between the visual fields and electrophysiological flicker tests including variability in perimetric data, neuronal adaptation and vascular autoregulation, are considered. The data have implications for the use of visual perimetry in the detection of ischaemic/hypoxic retinal disorders under photopic and mesopic light levels.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Of the numerous factors that play a role in fatal pedestrian collisions, the time of day, day of the week, and time of year can be significant determinants. More than 60% of all pedestrian collisions in 2007 occurred at night, despite the presumed decrease in both pedestrian and automobile exposure during the night. Although this trend is partially explained by factors such as fatigue and alcohol consumption, prior analysis of the Fatality Analysis Reporting System database suggests that pedestrian fatalities increase as light decreases after controlling for other factors. This study applies graphical cross-tabulation, a novel visual assessment approach, to explore the relationships among collision variables. The results reveal that twilight and the first hour of darkness typically observe the greatest frequency of pedestrian fatal collisions. These hours are not necessarily the most risky on a per mile travelled basis, however, because pedestrian volumes are often still high. Additional analysis is needed to quantify the extent to which pedestrian exposure (walking/crossing activity) in these time periods plays a role in pedestrian crash involvement. Weekly patterns of pedestrian fatal collisions vary by time of year due to the seasonal changes in sunset time. In December, collisions are concentrated around twilight and the first hour of darkness throughout the week while, in June, collisions are most heavily concentrated around twilight and the first hours of darkness on Friday and Saturday. Friday and Saturday nights in June may be the most dangerous times for pedestrians. Knowing when pedestrian risk is highest is critically important for formulating effective mitigation strategies and for efficiently investing safety funds. This applied visual approach is a helpful tool for researchers intending to communicate with policy-makers and to identify relationships that can then be tested with more sophisticated statistical tools.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a generic decoupled imagebased control scheme for cameras obeying the unified projection model. The scheme is based on the spherical projection model. Invariants to rotational motion are computed from this projection and used to control the translational degrees of freedom. Importantly we form invariants which decrease the sensitivity of the interaction matrix to object depth variation. Finally, the proposed results are validated with experiments using a classical perspective camera as well as a fisheye camera mounted on a 6-DOF robotic platform.