911 resultados para Machine Vision and Image Processing
Resumo:
Projective homography sits at the heart of many problems in image registration. In addition to many methods for estimating the homography parameters (R.I. Hartley and A. Zisserman, 2000), analytical expressions to assess the accuracy of the transformation parameters have been proposed (A. Criminisi et al., 1999). We show that these expressions provide less accurate bounds than those based on the earlier results of Weng et al. (1989). The discrepancy becomes more critical in applications involving the integration of frame-to-frame homographies and their uncertainties, as in the reconstruction of terrain mosaics and the camera trajectory from flyover imagery. We demonstrate these issues through selected examples
Resumo:
Photo-mosaicing techniques have become popular for seafloor mapping in various marine science applications. However, the common methods cannot accurately map regions with high relief and topographical variations. Ortho-mosaicing borrowed from photogrammetry is an alternative technique that enables taking into account the 3-D shape of the terrain. A serious bottleneck is the volume of elevation information that needs to be estimated from the video data, fused, and processed for the generation of a composite ortho-photo that covers a relatively large seafloor area. We present a framework that combines the advantages of dense depth-map and 3-D feature estimation techniques based on visual motion cues. The main goal is to identify and reconstruct certain key terrain feature points that adequately represent the surface with minimal complexity in the form of piecewise planar patches. The proposed implementation utilizes local depth maps for feature selection, while tracking over several views enables 3-D reconstruction by bundle adjustment. Experimental results with synthetic and real data validate the effectiveness of the proposed approach
Resumo:
We present a computer vision system that associates omnidirectional vision with structured light with the aim of obtaining depth information for a 360 degrees field of view. The approach proposed in this article combines an omnidirectional camera with a panoramic laser projector. The article shows how the sensor is modelled and its accuracy is proved by means of experimental results. The proposed sensor provides useful information for robot navigation applications, pipe inspection, 3D scene modelling etc
Resumo:
In a search for new sensor systems and new methods for underwater vehicle positioning based on visual observation, this paper presents a computer vision system based on coded light projection. 3D information is taken from an underwater scene. This information is used to test obstacle avoidance behaviour. In addition, the main ideas for achieving stabilisation of the vehicle in front of an object are presented
Resumo:
The absolute necessity of obtaining 3D information of structured and unknown environments in autonomous navigation reduce considerably the set of sensors that can be used. The necessity to know, at each time, the position of the mobile robot with respect to the scene is indispensable. Furthermore, this information must be obtained in the least computing time. Stereo vision is an attractive and widely used method, but, it is rather limited to make fast 3D surface maps, due to the correspondence problem. The spatial and temporal correspondence among images can be alleviated using a method based on structured light. This relationship can be directly found codifying the projected light; then each imaged region of the projected pattern carries the needed information to solve the correspondence problem. We present the most significant techniques, used in recent years, concerning the coded structured light method
Resumo:
An emerging consensus in cognitive science views the biological brain as a hierarchically-organized predictive processing system. This is a system in which higher-order regions are continuously attempting to predict the activity of lower-order regions at a variety of (increasingly abstract) spatial and temporal scales. The brain is thus revealed as a hierarchical prediction machine that is constantly engaged in the effort to predict the flow of information originating from the sensory surfaces. Such a view seems to afford a great deal of explanatory leverage when it comes to a broad swathe of seemingly disparate psychological phenomena (e.g., learning, memory, perception, action, emotion, planning, reason, imagination, and conscious experience). In the most positive case, the predictive processing story seems to provide our first glimpse at what a unified (computationally-tractable and neurobiological plausible) account of human psychology might look like. This obviously marks out one reason why such models should be the focus of current empirical and theoretical attention. Another reason, however, is rooted in the potential of such models to advance the current state-of-the-art in machine intelligence and machine learning. Interestingly, the vision of the brain as a hierarchical prediction machine is one that establishes contact with work that goes under the heading of 'deep learning'. Deep learning systems thus often attempt to make use of predictive processing schemes and (increasingly abstract) generative models as a means of supporting the analysis of large data sets. But are such computational systems sufficient (by themselves) to provide a route to general human-level analytic capabilities? I will argue that they are not and that closer attention to a broader range of forces and factors (many of which are not confined to the neural realm) may be required to understand what it is that gives human cognition its distinctive (and largely unique) flavour. The vision that emerges is one of 'homomimetic deep learning systems', systems that situate a hierarchically-organized predictive processing core within a larger nexus of developmental, behavioural, symbolic, technological and social influences. Relative to that vision, I suggest that we should see the Web as a form of 'cognitive ecology', one that is as much involved with the transformation of machine intelligence as it is with the progressive reshaping of our own cognitive capabilities.
Resumo:
One of the main challenges for developers of new human-computer interfaces is to provide a more natural way of interacting with computer systems, avoiding excessive use of hand and finger movements. In this way, also a valuable alternative communication pathway is provided to people suffering from motor disabilities. This paper describes the construction of a low cost eye tracker using a fixed head setup. Therefore a webcam, laptop and an infrared lighting source were used together with a simple frame to fix the head of the user. Furthermore, detailed information on the various image processing techniques used for filtering the centre of the pupil and different methods to calculate the point of gaze are discussed. An overall accuracy of 1.5 degrees was obtained while keeping the hardware cost of the device below 100 euros.
Resumo:
The motion of a car is described using a stochastic model in which the driving processes are the steering angle and the tangential acceleration. The model incorporates exactly the kinematic constraint that the wheels do not slip sideways. Two filters based on this model have been implemented, namely the standard EKF, and a new filter (the CUF) in which the expectation and the covariance of the system state are propagated accurately. Experiments show that i) the CUF is better than the EKF at predicting future positions of the car; and ii) the filter outputs can be used to control the measurement process, leading to improved ability to recover from errors in predictive tracking.
Resumo:
Model based vision allows use of prior knowledge of the shape and appearance of specific objects to be used in the interpretation of a visual scene; it provides a powerful and natural way to enforce the view consistency constraint. A model based vision system has been developed within ESPRIT VIEWS: P2152 which is able to classify and track moving objects (cars and other vehicles) in complex, cluttered traffic scenes. The fundamental basis of the method has been previously reported. This paper presents recent developments which have extended the scope of the system to include (i) multiple cameras, (ii) variable camera geometry, and (iii) articulated objects. All three enhancements have easily been accommodated within the original model-based approach
Resumo:
This paper reports the development of a highly parameterised 3-D model able to adopt the shapes of a wide variety of different classes of vehicles (cars, vans, buses, etc), and its subsequent specialisation to a generic car class which accounts for most commonly encountered types of car (includng saloon, hatchback and estate cars). An interactive tool has been developed to obtain sample data for vehicles from video images. A PCA description of the manually sampled data provides a deformable model in which a single instance is described as a 6 parameter vector. Both the pose and the structure of a car can be recovered by fitting the PCA model to an image. The recovered description is sufficiently accurate to discriminate between vehicle sub-classes.
Resumo:
This paper reports the current state of work to simplify our previous model-based methods for visual tracking of vehicles for use in a real-time system intended to provide continuous monitoring and classification of traffic from a fixed camera on a busy multi-lane motorway. The main constraints of the system design were: (i) all low level processing to be carried out by low-cost auxiliary hardware, (ii) all 3-D reasoning to be carried out automatically off-line, at set-up time. The system developed uses three main stages: (i) pose and model hypothesis using 1-D templates, (ii) hypothesis tracking, and (iii) hypothesis verification, using 2-D templates. Stages (i) & (iii) have radically different computing performance and computational costs, and need to be carefully balanced for efficiency. Together, they provide an effective way to locate, track and classify vehicles.
Resumo:
The classical computer vision methods can only weakly emulate some of the multi-level parallelisms in signal processing and information sharing that takes place in different parts of the primates’ visual system thus enabling it to accomplish many diverse functions of visual perception. One of the main functions of the primates’ vision is to detect and recognise objects in natural scenes despite all the linear and non-linear variations of the objects and their environment. The superior performance of the primates’ visual system compared to what machine vision systems have been able to achieve to date, motivates scientists and researchers to further explore this area in pursuit of more efficient vision systems inspired by natural models. In this paper building blocks for a hierarchical efficient object recognition model are proposed. Incorporating the attention-based processing would lead to a system that will process the visual data in a non-linear way focusing only on the regions of interest and hence reducing the time to achieve real-time performance. Further, it is suggested to modify the visual cortex model for recognizing objects by adding non-linearities in the ventral path consistent with earlier discoveries as reported by researchers in the neuro-physiology of vision.
Resumo:
This paper presents a review of the design and development of the Yorick series of active stereo camera platforms and their integration into real-time closed loop active vision systems, whose applications span surveillance, navigation of autonomously guided vehicles (AGVs), and inspection tasks for teleoperation, including immersive visual telepresence. The mechatronic approach adopted for the design of the first system, including head/eye platform, local controller, vision engine, gaze controller and system integration, proved to be very successful. The design team comprised researchers with experience in parallel computing, robot control, mechanical design and machine vision. The success of the project has generated sufficient interest to sanction a number of revisions of the original head design, including the design of a lightweight compact head for use on a robot arm, and the further development of a robot head to look specifically at increasing visual resolution for visual telepresence. The controller and vision processing engines have also been upgraded, to include the control of robot heads on mobile platforms and control of vergence through tracking of an operator's eye movement. This paper details the hardware development of the different active vision/telepresence systems.
Resumo:
The authors demonstrate four real-time reactive responses to movement in everyday scenes using an active head/eye platform. They first describe the design and realization of a high-bandwidth four-degree-of-freedom head/eye platform and visual feedback loop for the exploration of motion processing within active vision. The vision system divides processing into two scales and two broad functions. At a coarse, quasi-peripheral scale, detection and segmentation of new motion occurs across the whole image, and at fine scale, tracking of already detected motion takes place within a foveal region. Several simple coarse scale motion sensors which run concurrently at 25 Hz with latencies around 100 ms are detailed. The use of these sensors are discussed to drive the following real-time responses: (1) head/eye saccades to moving regions of interest; (2) a panic response to looming motion; (3) an opto-kinetic response to continuous motion across the image and (4) smooth pursuit of a moving target using motion alone.
Resumo:
The design of translation invariant and locally defined binary image operators over large windows is made difficult by decreased statistical precision and increased training time. We present a complete framework for the application of stacked design, a recently proposed technique to create two-stage operators that circumvents that difficulty. We propose a novel algorithm, based on Information Theory, to find groups of pixels that should be used together to predict the Output Value. We employ this algorithm to automate the process of creating a set of first-level operators that are later combined in a global operator. We also propose a principled way to guide this combination, by using feature selection and model comparison. Experimental results Show that the proposed framework leads to better results than single stage design. (C) 2009 Elsevier B.V. All rights reserved.