Modern helmet-mounted night vision devices, such as the Thales TopOwl helmet, project imagery from intensifiers mounted on the side of the helmet onto the helmet faceplate. The increased separation of the cameras induces hyperstereopsis - the exaggeration of the stereoscopic disparities that support the perception of relative depth around the point of fixation. Increased camera separation may also affect absolute depth perception, because it increases the amount of vergence (crossing) of the eyes required for binocular fusion, and because the differential perspective from the viewpoints of the two eyes is increased. The effect of hyperstereopsis on the perception of absolute distance was investigated using a large-scale stereoscopic display system. A fronto-parallel textured surface was projected at a distance of 6 metres. Three stereoscopic viewing conditions were simulated - hyperstereopsis (four times magnification), normal stereopsis, and hypostereopsis (one quarter magnification). The apparent distance of the surface was measured relative to a grid placed in a virtual "leaf room" that provided rich monocular cues, such as texture gradients and linear perspective, to absolute distance as well as veridical sterescopic disparity cues. The different stereoscopic viewing conditions had no differential effect on the apparent distance of the textured surface at this viewing distance


Stereoscopic depth perception utilizes the disparity cues between the images that fall on the retinae of the two eyes. The purpose of this study was to determine what role aging and optical blur play in stereoscopic disparity sensitivity for real depth stimuli. Forty-six volunteers were tested ranging in age from 15 to 60 years. Crossed and uncrossed disparity thresholds were measured using white light under conditions of best optical correction. The uncrossed disparity thresholds were also measured with optical blur (from +1.0D to +5.0D added to the best correction). Stereothresholds were measured using the Frisby Stereo Test, which utilizes a four-alternative forced-choice staircase procedure. The threshold disparities measured for young adults were frequently lower than 10 arcsec, a value considerably lower than the clinical estimates commonly obtained using Random Dot Stereograms (20 arcsec) or Titmus Fly Test (40 arcsec) tests. Contrary to previous reports, disparity thresholds increased between the ages of 31 and 45 years. This finding should be taken into account in clinical evaluation of visual function of older patients. Optical blur degrades visual acuity and stereoacuity similarly under white-light conditions, indicating that both functions are affected proportionally by optical defocus.


Abstract Originalsprache (englisch) Visual perception relies on a two-dimensional projection of the viewed scene on the retinas of both eyes. Thus, visual depth has to be reconstructed from a number of different cues that are subsequently integrated to obtain robust depth percepts. Existing models of sensory integration are mainly based on the reliabilities of individual cues and disregard potential cue interactions. In the current study, an extended Bayesian model is proposed that takes into account both cue reliability and consistency. Four experiments were carried out to test this model's predictions. Observers had to judge visual displays of hemi-cylinders with an elliptical cross section, which were constructed to allow for an orthogonal variation of several competing depth cues. In Experiment 1 and 2, observers estimated the cylinder's depth as defined by shading, texture, and motion gradients. The degree of consistency among these cues was systematically varied. It turned out that the extended Bayesian model provided a better fit to the empirical data compared to the traditional model which disregards covariations among cues. To circumvent the potentially problematic assessment of single-cue reliabilities, Experiment 3 used a multiple-observation task, which allowed for estimating perceptual weights from multiple-cue stimuli. Using the same multiple-observation task, the integration of stereoscopic disparity, shading, and texture gradients was examined in Experiment 4. It turned out that less reliable cues were downweighted in the combined percept. Moreover, a specific influence of cue consistency was revealed. Shading and disparity seemed to be processed interactively while other cue combinations could be well described by additive integration rules. These results suggest that cue combination in visual depth perception is highly flexible and depends on single-cue properties as well as on interrelations among cues. The extension of the traditional cue combination model is defended in terms of the necessity for robust perception in ecologically valid environments and the current findings are discussed in the light of emerging computational theories and neuroscientific approaches.


Perceptual learning is a training induced improvement in performance. Mechanisms underlying the perceptual learning of depth discrimination in dynamic random dot stereograms were examined by assessing stereothresholds as a function of decorrelation. The inflection point of the decorrelation function was defined as the level of decorrelation corresponding to 1.4 times the threshold when decorrelation is 0%. In general, stereothresholds increased with increasing decorrelation. Following training, stereothresholds and standard errors of measurement decreased systematically for all tested decorrelation values. Post training decorrelation functions were reduced by a multiplicative constant (approximately 5), exhibiting changes in stereothresholds without changes in the inflection points. Disparity energy model simulations indicate that a post-training reduction in neuronal noise can sufficiently account for the perceptual learning effects. In two subjects, learning effects were retained over a period of six months, which may have application for training stereo deficient subjects.


Stereo vision is a method of depth perception, in which depth information is inferred from two (or more) images of a scene, taken from different perspectives. Practical applications for stereo vision include aerial photogrammetry, autonomous vehicle guidance, robotics and industrial automation. The initial motivation behind this work was to produce a stereo vision sensor for mining automation applications. For such applications, the input stereo images would consist of close range scenes of rocks. A fundamental problem faced by matching algorithms is the matching or correspondence problem. This problem involves locating corresponding points or features in two images. For this application, speed, reliability, and the ability to produce a dense depth map are of foremost importance. This work implemented a number of areabased matching algorithms to assess their suitability for this application. Area-based techniques were investigated because of their potential to yield dense depth maps, their amenability to fast hardware implementation, and their suitability to textured scenes such as rocks. In addition, two non-parametric transforms, the rank and census, were also compared. Both the rank and the census transforms were found to result in improved reliability of matching in the presence of radiometric distortion - significant since radiometric distortion is a problem which commonly arises in practice. In addition, they have low computational complexity, making them amenable to fast hardware implementation. Therefore, it was decided that matching algorithms using these transforms would be the subject of the remainder of the thesis. An analytic expression for the process of matching using the rank transform was derived from first principles. This work resulted in a number of important contributions. Firstly, the derivation process resulted in one constraint which must be satisfied for a correct match. This was termed the rank constraint. The theoretical derivation of this constraint is in contrast to the existing matching constraints which have little theoretical basis. Experimental work with actual and contrived stereo pairs has shown that the new constraint is capable of resolving ambiguous matches, thereby improving match reliability. Secondly, a novel matching algorithm incorporating the rank constraint has been proposed. This algorithm was tested using a number of stereo pairs. In all cases, the modified algorithm consistently resulted in an increased proportion of correct matches. Finally, the rank constraint was used to devise a new method for identifying regions of an image where the rank transform, and hence matching, are more susceptible to noise. The rank constraint was also incorporated into a new hybrid matching algorithm, where it was combined a number of other ideas. These included the use of an image pyramid for match prediction, and a method of edge localisation to improve match accuracy in the vicinity of edges. Experimental results obtained from the new algorithm showed that the algorithm is able to remove a large proportion of invalid matches, and improve match accuracy.


Stereo vision is a method of depth perception, in which depth information is inferred from two (or more) images of a scene, taken from different perspectives. Applications of stereo vision include aerial photogrammetry, autonomous vehicle guidance, robotics, industrial automation and stereomicroscopy. A key issue in stereo vision is that of image matching, or identifying corresponding points in a stereo pair. The difference in the positions of corresponding points in image coordinates is termed the parallax or disparity. When the orientation of the two cameras is known, corresponding points may be projected back to find the location of the original object point in world coordinates. Matching techniques are typically categorised according to the nature of the matching primitives they use and the matching strategy they employ. This report provides a detailed taxonomy of image matching techniques, including area based, transform based, feature based, phase based, hybrid, relaxation based, dynamic programming and object space methods. A number of area based matching metrics as well as the rank and census transforms were implemented, in order to investigate their suitability for a real-time stereo sensor for mining automation applications. The requirements of this sensor were speed, robustness, and the ability to produce a dense depth map. The Sum of Absolute Differences matching metric was the least computationally expensive; however, this metric was the most sensitive to radiometric distortion. Metrics such as the Zero Mean Sum of Absolute Differences and Normalised Cross Correlation were the most robust to this type of distortion but introduced additional computational complexity. The rank and census transforms were found to be robust to radiometric distortion, in addition to having low computational complexity. They are therefore prime candidates for a matching algorithm for a stereo sensor for real-time mining applications. A number of issues came to light during this investigation which may merit further work. These include devising a means to evaluate and compare disparity results of different matching algorithms, and finding a method of assigning a level of confidence to a match. Another issue of interest is the possibility of statistically combining the results of different matching algorithms, in order to improve robustness.


Among the human factors that influence safe driving, visual skills of the driver can be considered fundamental. This study mainly focuses on investigating the effect of visual functions of drivers in India on their road crash involvement. Experiments were conducted to assess vision functions of Indian licensed drivers belonging to various organizations, age groups and driving experience. The test results were further related to the crash involvement histories of drivers through statistical tools. A generalized linear model was developed to ascertain the influence of these traits on propensity of crash involvement. Among the sampled drivers, colour vision, vertical field of vision, depth perception, contrast sensitivity, acuity and phoria were found to influence their crash involvement rates. In India, there are no efficient standards and testing methods to assess the visual capabilities of drivers during their licensing process and this study highlights the need for the same.


The ability to quickly detect and respond to visual stimuli in the environment is critical to many human activities. While such perceptual and visual-motor skills are important in a myriad of contexts, considerable variability exists between individuals in these abilities. To better understand the sources of this variability, we assessed perceptual and visual-motor skills in a large sample of 230 healthy individuals via the Nike SPARQ Sensory Station, and compared variability in their behavioral performance to demographic, state, sleep and consumption characteristics. Dimension reduction and regression analyses indicated three underlying factors: Visual-Motor Control, Visual Sensitivity, and Eye Quickness, which accounted for roughly half of the overall population variance in performance on this battery. Inter-individual variability in Visual-Motor Control was correlated with gender and circadian patters such that performance on this factor was better for males and for those who had been awake for a longer period of time before assessment. The current findings indicate that abilities involving coordinated hand movements in response to stimuli are subject to greater individual variability, while visual sensitivity and occulomotor control are largely stable across individuals.


The representation of a perceptual scene by a computer is usually limited to numbers representing dimensions and colours. The theory of affordances attempted to provide a new way of representing an environment, with respect to a particular agent. The view was introduced as part of an entire field of psychology labeled as 'ecological,' which has since branched into computer science through the field of robotics, and formal methods. This thesis will describe the concept of affordances, review several existing formalizations, and take a brief look at applications to robotics. The formalizations put forth in the last 20 years have no agreed upon structure, only that both the agent and the environment must be taken in relation to one another. Situation theory has also been evolving since its inception in 1983 by Barwise & Perry. The theory provided a formal way to represent any arbitrary piece of information in terms of relations. This thesis will take a toy version of situation theory published in CSLI lecture notes no. 22, and add to the given ontologies. This thesis extends the given ontologies to include specialized affordance types, and individual object types. This allows for the definition of semantic objects called environments, which support a situation and a set of affordances, and niches which refer to a set of actions for an individual. Finally, a possible way for an environment to change into a new environment is suggested via the activation of an affordance.


Diese Arbeit beschreibt den Evaluationsprozess einer dreidimensionalen Visualisierungstechnik, die am Institut für periphere Mikroelektronik der Universität Kassel entwickelt wurde. Hinter der dreidimensionalen Darstellung mittels Linsenrasterscheibe verbirgt sich eine neue Dimension der Interaktion mit dem Computer. Im Vergleich zu gewöhnlichen dreidimensionalen Darstellungen, bei denen ein 3D-Objekt auf einer 2D-Fläche abgebildet wird und somit nach wie vor nicht aus der Bildschirmebene heraus kann, können bei der stereoskopen Darstellung die Objekte dreidimensional visualisiert werden. Die Objekte tauchen vor, beziehungsweise hinter der Darstellungsebene auf. Da die Linsenrasterscheibe bisher noch nicht wahrnehmungspsychologisch untersucht wurde und auch allgemein auf dem Gebiet der Evaluation von 3D-Systemen nur wenige Untersuchungen mit quantitativen Ergebnissen verfügbar sind (Vollbracht, 1997), besteht hier ein zentrales Forschungsinteresse. Um eine Evaluation dieses 3D-Systems durchzuführen, wird im Theorieteil der Arbeit zunächst der Begriff der Evaluation definiert. Des Weiteren werden die wahrnehmungspsychologischen Grundlagen der monokularen und binokularen Raumwahrnehmung erörtert. Anschließend werden Techniken zur Erzeugung von Tiefe in Bildern und auf Bildschirmen erläutert und die Unterschiede zwischen der technisch erzeugten und der natürlichen Tiefenwahrnehmung näher beleuchtet. Nach der Vorstellung verschiedener stereoskoper Systeme wird näher auf die autostereoskope Linsenrasterscheibe eingegangen. Zum Abschluss des theoretischen Teils dieser Arbeit wird die Theorie des eingesetzten Befindlichkeitsfragebogens veranschaulicht. Gegenstand des empirischen Teils dieser Arbeit sind zwei zentrale Fragestellungen. Erstens soll untersucht werden, ob durch den höheren Informationsgehalt grundlegende Wahrnehmungsleistungen in bestimmten Bereichen positiv beeinflusst werden können. Zweitens soll untersucht werden, ob sich die höhere visuelle Natürlichkeit und die Neuartigkeit der Bildpräsentation auch auf die subjektive Befindlichkeit der Probanden auswirkt. Die empirische Überprüfung dieser Hypothesen erfolgt mittels dreier Experimente. Bei den ersten beiden Experimenten stehen grundlegende wahrnehmungspsychologische Leistungen im Vordergrund, während in der dritten Untersuchung der Bereich der subjektiven Befindlichkeit gemessen wird. Abschließend werden die Ergebnisse der Untersuchungen vorgestellt und diskutiert. Des Weiteren werden konkrete Einsatzmöglichkeiten für die Linsenrasterscheibe aufgezeigt und denkbare nachfolgende experimentelle Vorgehensweisen skizziert.


We present a computer vision system that associates omnidirectional vision with structured light with the aim of obtaining depth information for a 360 degrees field of view. The approach proposed in this article combines an omnidirectional camera with a panoramic laser projector. The article shows how the sensor is modelled and its accuracy is proved by means of experimental results. The proposed sensor provides useful information for robot navigation applications, pipe inspection, 3D scene modelling etc


Catadioptric sensors are combinations of mirrors and lenses made in order to obtain a wide field of view. In this paper we propose a new sensor that has omnidirectional viewing ability and it also provides depth information about the nearby surrounding. The sensor is based on a conventional camera coupled with a laser emitter and two hyperbolic mirrors. Mathematical formulation and precise specifications of the intrinsic and extrinsic parameters of the sensor are discussed. Our approach overcomes limitations of the existing omni-directional sensors and eventually leads to reduced costs of production


This paper focuses on the problem of realizing a plane-to-plane virtual link between a camera attached to the end-effector of a robot and a planar object. In order to do the system independent to the object surface appearance, a structured light emitter is linked to the camera so that 4 laser pointers are projected onto the object. In a previous paper we showed that such a system has good performance and nice characteristics like partial decoupling near the desired state and robustness against misalignment of the emitter and the camera (J. Pages et al., 2004). However, no analytical results concerning the global asymptotic stability of the system were obtained due to the high complexity of the visual features utilized. In this work we present a better set of visual features which improves the properties of the features in (J. Pages et al., 2004) and for which it is possible to prove the global asymptotic stability


In this paper we face the problem of positioning a camera attached to the end-effector of a robotic manipulator so that it gets parallel to a planar object. Such problem has been treated for a long time in visual servoing. Our approach is based on linking to the camera several laser pointers so that its configuration is aimed to produce a suitable set of visual features. The aim of using structured light is not only for easing the image processing and to allow low-textured objects to be treated, but also for producing a control scheme with nice properties like decoupling, stability, well conditioning and good camera trajectory


Coded structured light is an optical technique based on active stereovision that obtains the shape of objects. One shot techniques are based on projecting a unique light pattern with an LCD projector so that grabbing an image with a camera, a large number of correspondences can be obtained. Then, a 3D reconstruction of the illuminated object can be recovered by means of triangulation. The most used strategy to encode one-shot patterns is based on De Bruijn sequences. In This work a new way to design patterns using this type of sequences is presented. The new coding strategy minimises the number of required colours and maximises both the resolution and the accuracy