821 resultados para Monocular Vision.
Resumo:
The development and refinement of techniques that make simultaneous localization and mapping (SLAM) for an autonomous mobile robot and the building of local 3-D maps from a sequence of images, is widely studied in scientific circles. This work presents a monocular visual SLAM technique based on extended Kalman filter, which uses features found in a sequence of images using the SURF descriptor (Speeded Up Robust Features) and determines which features can be used as marks by a technique based on delayed initialization from 3-D straight lines. For this, only the coordinates of the features found in the image and the intrinsic and extrinsic camera parameters are avaliable. Its possible to determine the position of the marks only on the availability of information of depth. Tests have shown that during the route, the mobile robot detects the presence of characteristics in the images and through a proposed technique for delayed initialization of marks, adds new marks to the state vector of the extended Kalman filter (EKF), after estimating the depth of features. With the estimated position of the marks, it was possible to estimate the updated position of the robot at each step, obtaining good results that demonstrate the effectiveness of monocular visual SLAM system proposed in this paper
Resumo:
In this paper we present a hybrid method to track human motions in real-time. With simplified marker sets and monocular video input, the strength of both marker-based and marker-free motion capturing are utilized: A cumbersome marker calibration is avoided while the robustness of the marker-free tracking is enhanced by referencing the tracked marker positions. An improved inverse kinematics solver is employed for real-time pose estimation. A computer-visionbased approach is applied to refine the pose estimation and reduce the ambiguity of the inverse kinematics solutions. We use this hybrid method to capture typical table tennis upper body movements in a real-time virtual reality application.
Resumo:
Ocular dominance (OD) plasticity is a robust paradigm for examining the functional consequences of synaptic plasticity. Previous experimental and theoretical results have shown that OD plasticity can be accounted for by known synaptic plasticity mechanisms, using the assumption that deprivation by lid suture eliminates spatial structure in the deprived channel. Here we show that in the mouse, recovery from monocular lid suture can be obtained by subsequent binocular lid suture but not by dark rearing. This poses a significant challenge to previous theoretical results. We therefore performed simulations with a natural input environment appropriate for mouse visual cortex. In contrast to previous work, we assume that lid suture causes degradation but not elimination of spatial structure, whereas dark rearing produces elimination of spatial structure. We present experimental evidence that supports this assumption, measuring responses through sutured lids in the mouse. The change in assumptions about the input environment is sufficient to account for new experimental observations, while still accounting for previous experimental results.
Resumo:
Autonomous aerial refueling is a key enabling technology for both manned and unmanned aircraft where extended flight duration or range are required. The results presented within this paper offer one potential vision-based sensing solution, together with a unique test environment. A hierarchical visual tracking algorithm based on direct methods is proposed and developed for the purposes of tracking a drogue during the capture stage of autonomous aerial refueling, and of estimating its 3D position. Intended to be applied in real time to a video stream from a single monocular camera mounted on the receiver aircraft, the algorithm is shown to be highly robust, and capable of tracking large, rapid drogue motions within the frame of reference. The proposed strategy has been tested using a complex robotic testbed and with actual flight hardware consisting of a full size probe and drogue. Results show that the vision tracking algorithm can detect and track the drogue at real-time frame rates of more than thirty frames per second, obtaining a robust position estimation even with strong motions and multiple occlusions of the drogue.
Resumo:
In this paper we propose an innovative method for the automatic detection and tracking of road traffic signs using an onboard stereo camera. It involves a combination of monocular and stereo analysis strategies to increase the reliability of the detections such that it can boost the performance of any traffic sign recognition scheme. Firstly, an adaptive color and appearance based detection is applied at single camera level to generate a set of traffic sign hypotheses. In turn, stereo information allows for sparse 3D reconstruction of potential traffic signs through a SURF-based matching strategy. Namely, the plane that best fits the cloud of 3D points traced back from feature matches is estimated using a RANSAC based approach to improve robustness to outliers. Temporal consistency of the 3D information is ensured through a Kalman-based tracking stage. This also allows for the generation of a predicted 3D traffic sign model, which is in turn used to enhance the previously mentioned color-based detector through a feedback loop, thus improving detection accuracy. The proposed solution has been tested with real sequences under several illumination conditions and in both urban areas and highways, achieving very high detection rates in challenging environments, including rapid motion and significant perspective distortion
Resumo:
El principal objetivo de este trabajo es proporcionar una solución en tiempo real basada en visión estéreo o monocular precisa y robusta para que un vehículo aéreo no tripulado (UAV) sea autónomo en varios tipos de aplicaciones UAV, especialmente en entornos abarrotados sin señal GPS. Este trabajo principalmente consiste en tres temas de investigación de UAV basados en técnicas de visión por computador: (I) visual tracking, proporciona soluciones efectivas para localizar visualmente objetos de interés estáticos o en movimiento durante el tiempo que dura el vuelo del UAV mediante una aproximación adaptativa online y una estrategia de múltiple resolución, de este modo superamos los problemas generados por las diferentes situaciones desafiantes, tales como cambios significativos de aspecto, iluminación del entorno variante, fondo del tracking embarullado, oclusión parcial o total de objetos, variaciones rápidas de posición y vibraciones mecánicas a bordo. La solución ha sido utilizada en aterrizajes autónomos, inspección de plataformas mar adentro o tracking de aviones en pleno vuelo para su detección y evasión; (II) odometría visual: proporciona una solución eficiente al UAV para estimar la posición con 6 grados de libertad (6D) usando únicamente la entrada de una cámara estéreo a bordo del UAV. Un método Semi-Global Blocking Matching (SGBM) eficiente basado en una estrategia grueso-a-fino ha sido implementada para una rápida y profunda estimación del plano. Además, la solución toma provecho eficazmente de la información 2D y 3D para estimar la posición 6D, resolviendo de esta manera la limitación de un punto de referencia fijo en la cámara estéreo. Una robusta aproximación volumétrica de mapping basada en el framework Octomap ha sido utilizada para reconstruir entornos cerrados y al aire libre bastante abarrotados en 3D con memoria y errores correlacionados espacialmente o temporalmente; (III) visual control, ofrece soluciones de control prácticas para la navegación de un UAV usando Fuzzy Logic Controller (FLC) con la estimación visual. Y el framework de Cross-Entropy Optimization (CEO) ha sido usado para optimizar el factor de escala y la función de pertenencia en FLC. Todas las soluciones basadas en visión en este trabajo han sido probadas en test reales. Y los conjuntos de datos de imágenes reales grabados en estos test o disponibles para la comunidad pública han sido utilizados para evaluar el rendimiento de estas soluciones basadas en visión con ground truth. Además, las soluciones de visión presentadas han sido comparadas con algoritmos de visión del estado del arte. Los test reales y los resultados de evaluación muestran que las soluciones basadas en visión proporcionadas han obtenido rendimientos en tiempo real precisos y robustos, o han alcanzado un mejor rendimiento que aquellos algoritmos del estado del arte. La estimación basada en visión ha ganado un rol muy importante en controlar un UAV típico para alcanzar autonomía en aplicaciones UAV. ABSTRACT The main objective of this dissertation is providing real-time accurate robust monocular or stereo vision-based solution for Unmanned Aerial Vehicle (UAV) to achieve the autonomy in various types of UAV applications, especially in GPS-denied dynamic cluttered environments. This dissertation mainly consists of three UAV research topics based on computer vision technique: (I) visual tracking, it supplys effective solutions to visually locate interesting static or moving object over time during UAV flight with on-line adaptivity approach and multiple-resolution strategy, thereby overcoming the problems generated by the different challenging situations, such as significant appearance change, variant surrounding illumination, cluttered tracking background, partial or full object occlusion, rapid pose variation and onboard mechanical vibration. The solutions have been utilized in autonomous landing, offshore floating platform inspection and midair aircraft tracking for sense-and-avoid; (II) visual odometry: it provides the efficient solution for UAV to estimate the 6 Degree-of-freedom (6D) pose using only the input of stereo camera onboard UAV. An efficient Semi-Global Blocking Matching (SGBM) method based on a coarse-to-fine strategy has been implemented for fast depth map estimation. In addition, the solution effectively takes advantage of both 2D and 3D information to estimate the 6D pose, thereby solving the limitation of a fixed small baseline in the stereo camera. A robust volumetric occupancy mapping approach based on the Octomap framework has been utilized to reconstruct indoor and outdoor large-scale cluttered environments in 3D with less temporally or spatially correlated measurement errors and memory; (III) visual control, it offers practical control solutions to navigate UAV using Fuzzy Logic Controller (FLC) with the visual estimation. And the Cross-Entropy Optimization (CEO) framework has been used to optimize the scaling factor and the membership function in FLC. All the vision-based solutions in this dissertation have been tested in real tests. And the real image datasets recorded from these tests or available from public community have been utilized to evaluate the performance of these vision-based solutions with ground truth. Additionally, the presented vision solutions have compared with the state-of-art visual algorithms. Real tests and evaluation results show that the provided vision-based solutions have obtained real-time accurate robust performances, or gained better performance than those state-of-art visual algorithms. The vision-based estimation has played a critically important role for controlling a typical UAV to achieve autonomy in the UAV application.
Resumo:
The majority of neurons in the primary visual cortex of primates can be activated by stimulation of either eye; moreover, the monocular receptive fields of such neurons are located in about the same region of visual space. These well-known facts imply that binocular convergence in visual cortex can explain our cyclopean view of the world. To test the adequacy of this assumption, we examined how human subjects integrate binocular events in time. Light flashes presented synchronously to both eyes were compared to flashes presented alternately (asynchronously) to one eye and then the other. Subjects perceived very-low-frequency (2 Hz) asynchronous trains as equivalent to synchronous trains flashed at twice the frequency (the prediction based on binocular convergence). However, at higher frequencies of presentation (4-32 Hz), subjects perceived asynchronous and synchronous trains to be increasingly similar. Indeed, at the flicker-fusion frequency (approximately 50 Hz), the apparent difference between the two conditions was only 2%. We suggest that the explanation of these anomalous findings is that we parse visual input into sequential episodes.
Resumo:
Grove, Gillam, and Ono [Grove, P. M., Gillam, B. J., & Ono, H. (2002). Content and context. of monocular regions determine perceived depth in random dot, unpaired background and phantom stereograms. Vision Research, 42, 1859-1870] reported that perceived depth in monocular gap stereograms [Gillam, B. J., Blackburn, S., & Nakayama, K. (1999). Stereopsis based on monocular gaps: Metrical encoding of depth and slant without matching contours. Vision Research, 39, 493-502] was attenuated when the color/texture in the monocular gap did not match the background. It appears that continuation of the gap with the background constitutes an important component of the stimulus conditions that allow a monocular gap in an otherwise binocular surface to be responded to as a depth step. In this report we tested this view using the conventional monocular gap stimulus of two identical grey rectangles separated by a gap in one eye but abutting to form a solid grey rectangle in the other. We compared depth seen at the gap for this stimulus with stimuli that were identical except for two additional small black squares placed at the ends of the gap. If the squares were placed stereoscopically behind the rectangle/gap configuration (appearing on the background) they interfered with the perceived depth at the gap. However when they were placed in front of the configuration this attenuation disappeared. The gap and the background were able under these conditions to complete amodally. (c) 2006 Elsevier Ltd. All rights reserved.
Resumo:
A fundamental problem for any visual system with binocular overlap is the combination of information from the two eyes. Electrophysiology shows that binocular integration of luminance contrast occurs early in visual cortex, but a specific systems architecture has not been established for human vision. Here, we address this by performing binocular summation and monocular, binocular, and dichoptic masking experiments for horizontal 1 cycle per degree test and masking gratings. These data reject three previously published proposals, each of which predict too little binocular summation and insufficient dichoptic facilitation. However, a simple development of one of the rejected models (the twin summation model) and a completely new model (the two-stage model) provide very good fits to the data. Two features common to both models are gently accelerating (almost linear) contrast transduction prior to binocular summation and suppressive ocular interactions that contribute to contrast gain control. With all model parameters fixed, both models correctly predict (1) systematic variation in psychometric slopes, (2) dichoptic contrast matching, and (3) high levels of binocular summation for various levels of binocular pedestal contrast. A review of evidence from elsewhere leads us to favor the two-stage model. © 2006 ARVO.
Resumo:
The human visual system combines contrast information from the two eyes to produce a single cyclopean representation of the external world. This task requires both summation of congruent images and inhibition of incongruent images across the eyes. These processes were explored psychophysically using narrowband sinusoidal grating stimuli. Initial experiments focussed on binocular interactions within a single detecting mechanism, using contrast discrimination and contrast matching tasks. Consistent with previous findings, dichoptic presentation produced greater masking than monocular or binocular presentation. Four computational models were compared, two of which performed well on all data sets. Suppression between mechanisms was then investigated, using orthogonal and oblique stimuli. Two distinct suppressive pathways were identified, corresponding to monocular and dichoptic presentation. Both pathways impact prior to binocular summation of signals, and differ in their strengths, tuning, and response to adaptation, consistent with recent single-cell findings in cat. Strikingly, the magnitude of dichoptic masking was found to be spatiotemporally scale invariant, whereas monocular masking was dependent on stimulus speed. Interocular suppression was further explored using a novel manipulation, whereby stimuli were presented in dichoptic antiphase. Consistent with the predictions of a computational model, this produced weaker masking than in-phase presentation. This allowed the bandwidths of suppression to be measured without the complicating factor of additive combination of mask and test. Finally, contrast vision in strabismic amblyopia was investigated. Although amblyopes are generally believed to have impaired binocular vision, binocular summation was shown to be intact when stimuli were normalized for interocular sensitivity differences. An alternative account of amblyopia was developed, in which signals in the affected eye are subject to attenuation and additive noise prior to binocular combination.
Resumo:
Over the last ten years our understanding of early spatial vision has improved enormously. The long-standing model of probability summation amongst multiple independent mechanisms with static output nonlinearities responsible for masking is obsolete. It has been replaced by a much more complex network of additive, suppressive, and facilitatory interactions and nonlinearities across eyes, area, spatial frequency, and orientation that extend well beyond the classical recep-tive field (CRF). A review of a substantial body of psychophysical work performed by ourselves (20 papers), and others, leads us to the following tentative account of the processing path for signal contrast. The first suppression stage is monocular, isotropic, non-adaptable, accelerates with RMS contrast, most potent for low spatial and high temporal frequencies, and extends slightly beyond the CRF. Second and third stages of suppression are difficult to disentangle but are possibly pre- and post-binocular summation, and involve components that are scale invariant, isotropic, anisotropic, chromatic, achromatic, adaptable, interocular, substantially larger than the CRF, and saturated by contrast. The monocular excitatory pathways begin with half-wave rectification, followed by a preliminary stage of half-binocular summation, a square-law transducer, full binocular summation, pooling over phase, cross-mechanism facilitatory interactions, additive noise, linear summation over area, and a slightly uncertain decision-maker. The purpose of each of these interactions is far from clear, but the system benefits from area and binocular summation of weak contrast signals as well as area and ocularity invariances above threshold (a herd of zebras doesn't change its contrast when it increases in number or when you close one eye). One of many remaining challenges is to determine the stage or stages of spatial tuning in the excitatory pathway.
Resumo:
In experiments reported elsewhere at this conference, we have revealed two striking results concerning binocular interactions in a masking paradigm. First, at low mask contrasts, a dichoptic masking grating produces a small facilitatory effect on the detection of a similar test grating. Second, the psychometric slope for dichoptic masking starts high (Weibull ß~4) at detection threshold, becomes low (ß~1.2) in the facilitatory region, and then unusually steep at high mask contrasts (ß~5.5). Neither of these results is consistent with Legge's (1984 Vision Research 24 385 - 394) model of binocular summation, but they are predicted by a two-stage gain control model in which interocular suppression precedes binocular summation. Here, we pose a further challenge for this model by using a 'twin-mask' paradigm (cf Foley, 1994 Journal of the Optical Society of America A 11 1710 - 1719). In 2AFC experiments, observers detected a patch of grating (1 cycle deg-1, 200 ms) presented to one eye in the presence of a pedestal in the same eye and a spatially identical mask in the other eye. The pedestal and mask contrasts varied independently, producing a two-dimensional masking space in which the orthogonal axes (10X10 contrasts) represent conventional dichoptic and monocular masking. The resulting surface (100 thresholds) confirmed and extended the observations above, and fixed the six parameters in the model, which fitted the data well. With no adjustment of parameters, the model described performance in a further experiment where mask and test were presented to both eyes. Moreover, in both model and data, binocular summation was greater than a factor of v2 at detection threshold. We conclude that this two-stage nonlinear model, with interocular suppression, gives a good account of early binocular processes in the perception of contrast. [Supported by EPSRC Grant Reference: GR/S74515/01]
Resumo:
The classic hypothesis of Livingstone and Hubel (1984, 1987) proposed two types of color pathways in primate visual cortex based on recordings from single cells: a segregated, modularpathway that signals color but provides little information about shape or form and a second pathway that signals color differences and so defines forms without the need to specify their colors. A major problem has been to reconcile this neurophysiological hypothesis with the behavioral data. A wealth of psychophysical studies has demonstrated that color vision has orientation-tuned responses and little impairment on form related tasks, but these have not revealed any direct evidence for nonoriented mechanisms. Here we use a psychophysical method of subthreshold summation across orthogonal orientations for isoluminant red-green gratings in monocular and dichoptic viewing conditions to differentiate between nonoriented and orientation-tuned responses to color contrast. We reveal nonoriented color responses at low spatial frequencies (0.25-0.375 c/deg) under monocular conditions changing to orientation-tuned responses at higher spatial frequencies (1.5 c/deg) and under binocular conditions. We suggest that two distinct pathways coexist in color vision at the behavioral level, revealed at different spatial scales: one is isotropic, monocular, and best equipped for the representation of surface color, and the other is orientation-tuned, binocular, and selective for shape and form. This advances our understanding of the organization of the neural pathways involved in human color vision and provides a strong link between neurophysiological and behavioral data. © 2013 ARVO.
Resumo:
Golfers, coaches and researchers alike, have all keyed in on golf putting as an important aspect of overall golf performance. Of the three principle putting tasks (green reading, alignment and the putting action phase), the putting action phase has attracted the most attention from coaches, players and researchers alike. This phase includes the alignment of the club with the ball, the swing, and ball contact. A significant amount of research in this area has focused on measuring golfer’s vision strategies with eye tracking equipment. Unfortunately this research suffers from a number of shortcomings, which limit its usefulness. The purpose of this thesis was to address some of these shortcomings. The primary objective of this thesis was to re-evaluate golfer’s putting vision strategies using binocular eye tracking equipment and to define a new, optimal putting vision strategy which was associated with both higher skill and success. In order to facilitate this research, bespoke computer software was developed and validated, and new gaze behaviour criteria were defined. Additionally, the effects of training (habitual) and competition conditions on the putting vision strategy were examined, as was the effect of ocular dominance. Finally, methods for improving golfer’s binocular vision strategies are discussed, and a clinical plan for the optometric management of the golfer’s vision is presented. The clinical management plan includes the correction of fundamental aspects of golfers’ vision, including monocular refractive errors and binocular vision defects, as well as enhancement of their putting vision strategy, with the overall aim of improving performance on the golf course. This research has been undertaken in order to gain a better understanding of the human visual system and how it relates to the sport performance of golfers specifically. Ultimately, the analysis techniques and methods developed are applicable to the assessment of visual performance in all sports.