A multi-resolution technique for matching a stereo pair of images based on translation invariant discrete multi-wavelet transform is presented. The technique uses the well known coarse to fine strategy, involving the calculation of matching points at the coarsest level with consequent refinement up to the finest level. Vector coefficients of the wavelet transform modulus are used as matching features, where modulus maxima defines the shift invariant high-level features (multiscale edges) with phase pointing to the normal of the feature surface. The technique addresses the estimation of optimal corresponding points and the corresponding 2D disparity maps. Illuminative variation that can exist between the perspective views of the same scene is controlled using scale normalization at each decomposition level by dividing the details space coefficients with approximation space and then using normalized correlation. The problem of ambiguity, explicitly, and occlusion, implicitly, is addressed by using a geometric topological refinement procedure and symbolic tagging.


The problem of visual simultaneous localization and mapping (SLAM) is examined in this paper using ideas and algorithms from robust control and estimation theory. Using a stereo-vision based sensor, a nonlinear measurement model is derived which leads to nonlinear measurements of the landmark coordinates along with optical flow based measurements of the relative robot-landmark velocity. Using a novel analytical measurement transformation, the nonlinear SLAM problem is converted into the linear filter is guaranteed stable and the ALAM state estimation error is bounded within an ellipsoidal set. No similar results are available for the commonly employed extended Kalman filter which is known to exhibit divergent and inconsistency characteristics in practice.


Vision-based tracking of an object using perspective projection inherently results in non-linear measurement equations in the Cartesian coordinates. The underlying object kinematics can be modelled by a linear system. In this paper we introduce a measurement conversion technique that analytically transforms the non-linear measurement equations obtained from a stereo-vision system into a system of linear measurement equations.We then design a robust linear filter around the converted measurement system. The state estimation error of the proposed filter is bounded and we provide a rigorous theoretical analysis of this result. The performance of the robust filter developed in this paper is demonstrated via computer simulation and via practical experimentation using a robotic manipulator as a target. The proposed filter is shown to outperform the extended Kalman filter (EKF).


The problem of visual simultaneous localization and mapping (SLAM) is examined in this paper using recently developed ideas and algorithms from modern robust control and estimation theory. A nonlinear model for a stereo-vision-based sensor is derived that leads to nonlinear measurements of the landmark coordinates along with optical flow-based measurements of the relative robot-landmark velocity. Using a novel analytical measurement transformation, the nonlinear SLAM problem is converted into the linear domain and solved using a robust linear filter. Actually, the linear filter is guaranteed stable and the SLAM state estimation error is bounded within an ellipsoidal set. A mathematically rigorous stability proof is given that holds true even when the landmarks move in accordance with an unknown control input. No similar results are available for the commonly employed extended Kalman filter, which is known to exhibit divergence and inconsistency characteristics in practice. A number of illustrative examples are given using both simulated and real vision data that further validate the proposed method.


Vision based tracking of an object using the ideas of perspective projection inherently consists of nonlinearly modelled measurements although the underlying dynamic system that encompasses the object and the vision sensors can be linear. Based on a necessary stereo vision setting, we introduce an appropriate measurement conversion techniques which subsequently facilitate using a linear filter. Linear filter together with the aforementioned measurement conversion approach conforms a robust linear filter that is based on the set values state estimation ideas; a particularly rich area in the robust control literature. We provide a rigorously theoretical analysis to ensure bounded state estimation errors formulated in terms of an ellipsoidal set in which the actual state is guaranteed to be included to an arbitrary high probability. Using computer simulations as well as a practical implementation consisting of a robotic manipulator, we demonstrate our linear robust filter significantly outperforms the traditionally used extended Kalman filter under this stereo vision scenario. © 2008 IEEE.


In this paper, the concept of Matching Parallelepiped (MP) is presented. It is shown that the volume of the MP can be used as an additional measure of `distance' between a pair of candidate points in a matching algorithm by Relaxation Labeling (RL). The volume of the MP is related with the Epipolar Geometry and the use of this measure works as an epipolar constraint in a RL process, decreasing the efforts in the matching algorithm since it is not necessary to explicitly determine the equations of the epipolar lines and to compute the distance of a candidate point to each epipolar line. As at the beginning of the process the Relative Orientation (RO) parameters are unknown, a initial matching based on gradient, intensities and correlation is obtained. Based on this set of labeled points the RO is determined and the epipolar constraint included in the algorithm. The obtained results shown that the proposed approach is suitable to determine feature-point matching with simultaneous estimation of camera orientation parameters even for the cases where the pair of optical axes are not parallel.


[EN] In this work, we present a new model for a dense disparity estimation and the 3-D geometry reconstruction using a color image stereo pair. First, we present a brief introduction to the 3-D Geometry of a camera system. Next, we propose a new model for the disparity estimation based on an energy functional. We look for the local minima of the energy using the associate Euler-Langrage partial differential equations. This model is a generalization to color image of the model developed in, with some changes in the strategy to avoid the irrelevant local minima. We present some numerical experiences of 3-D reconstruction, using this method some real stereo pairs.


[EN] We present an energy based approach to estimate a dense disparity map from a set of two weakly calibrated stereoscopic images while preserving its discontinuities resulting from image boundaries. We first derive a simplified expression for the disparity that allows us to estimate it from a stereo pair of images using an energy minimization approach. We assume that the epipolar geometry is known, and we include this information in the energy model. Discontinuities are preserved by means of a regularization term based on the Nagel-Enkelmann operator. We investigate the associated Euler-Lagrange equation of the energy functional, and we approach the solution of the underlying partial differential equation (PDE) using a gradient descent method The resulting parabolic problem has a unique solution. In order to reduce the risk to be trapped within some irrelevant local minima during the iterations, we use a focusing strategy based on a linear scalespace. Experimental results on both synthetic and real images arere presented to illustrate the capabilities of this PDE and scale-space based method.


When stereo images are captured under less than ideal conditions, there may be inconsistencies between the two images in brightness, contrast, blurring, etc. When stereo matching is performed between the images, these variations can greatly reduce the quality of the resulting depth map. In this paper we propose a method for correcting sharpness variations in stereo image pairs which is performed as a pre-processing step to stereo matching. Our method is based on scaling the 2D discrete cosine transform (DCT) coefficients of both images so that the two images have the same amount of energy in each of a set of frequency bands. Experiments show that applying the proposed correction method can greatly improve the disparity map quality when one image in a stereo pair is more blurred than the other.


This thesis covers a broad part of the field of computational photography, including video stabilization and image warping techniques, introductions to light field photography and the conversion of monocular images and videos into stereoscopic 3D content. We present a user assisted technique for stereoscopic 3D conversion from 2D images. Our approach exploits the geometric structure of perspective images including vanishing points. We allow a user to indicate lines, planes, and vanishing points in the input image, and directly employ these as guides of an image warp that produces a stereo image pair. Our method is most suitable for scenes with large scale structures such as buildings and is able to skip the step of constructing a depth map. Further, we propose a method to acquire 3D light fields using a hand-held camera, and describe several computational photography applications facilitated by our approach. As the input we take an image sequence from a camera translating along an approximately linear path with limited camera rotations. Users can acquire such data easily in a few seconds by moving a hand-held camera. We convert the input into a regularly sampled 3D light field by resampling and aligning them in the spatio-temporal domain. We also present a novel technique for high-quality disparity estimation from light fields. Finally, we show applications including digital refocusing and synthetic aperture blur, foreground removal, selective colorization, and others.


An image processing observational technique for the stereoscopic reconstruction of the wave form of oceanic sea states is developed. The technique incorporates the enforcement of any given statistical wave law modeling the quasi Gaussianity of oceanic waves observed in nature. The problem is posed in a variational optimization framework, where the desired wave form is obtained as the minimizer of a cost functional that combines image observations, smoothness priors and a weak statistical constraint. The minimizer is obtained combining gradient descent and multigrid methods on the necessary optimality equations of the cost functional. Robust photometric error criteria and a spatial intensity compensation model are also developed to improve the performance of the presented image matching strategy. The weak statistical constraint is thoroughly evaluated in combination with other elements presented to reconstruct and enforce constraints on experimental stereo data, demonstrating the improvement in the estimation of the observed ocean surface.


We present a remote sensing observational method for the measurement of the spatio-temporal dynamics of ocean waves. Variational techniques are used to recover a coherent space-time reconstruction of oceanic sea states given stereo video imagery. The stereoscopic reconstruction problem is expressed in a variational optimization framework. There, we design an energy functional whose minimizer is the desired temporal sequence of wave heights. The functional combines photometric observations as well as spatial and temporal regularizers. A nested iterative scheme is devised to numerically solve, via 3-D multigrid methods, the system of partial differential equations resulting from the optimality condition of the energy functional. The output of our method is the coherent, simultaneous estimation of the wave surface height and radiance at multiple snapshots. We demonstrate our algorithm on real data collected off-shore. Statistical and spectral analysis are performed. Comparison with respect to an existing sequential method is analyzed.


The evolution of the television market is led by 3DTV technology, and this tendency can accelerate during the next years according to expert forecasts. However, 3DTV delivery by broadcast networks is not currently developed enough, and acts as a bottleneck for the complete deployment of the technology. Thus, increasing interest is dedicated to ste-reo 3DTV formats compatible with current HDTV video equipment and infrastructure, as they may greatly encourage 3D acceptance. In this paper, different subsampling schemes for HDTV compatible transmission of both progressive and interlaced stereo 3DTV are studied and compared. The frequency characteristics and preserved frequency content of each scheme are analyzed, and a simple interpolation filter is specially designed. Finally, the advantages and disadvantages of the different schemes and filters are evaluated through quality testing on several progressive and interlaced video sequences.


A novel algorithm for performing registration of dynamic contrast-enhanced (DCE) MRI data of the breast is presented. It is based on an algorithm known as iterated dynamic programming originally devised to solve the stereo matching problem. Using artificially distorted DCE-MRI breast images it is shown that the proposed algorithm is able to correct for movement and distortions over a larger range than is likely to occur during routine clinical examination. In addition, using a clinical DCE-MRI data set with an expertly labeled suspicious region, it is shown that the proposed algorithm significantly reduces the variability of the enhancement curves at the pixel level yielding more pronounced uptake and washout phases.


This paper addresses the problem of obtaining complete, detailed reconstructions of textureless shiny objects. We present an algorithm which uses silhouettes of the object, as well as images obtained under changing illumination conditions. In contrast with previous photometric stereo techniques, ours is not limited to a single viewpoint but produces accurate reconstructions in full 3D. A number of images of the object are obtained from multiple viewpoints, under varying lighting conditions. Starting from the silhouettes, the algorithm recovers camera motion and constructs the object's visual hull. This is then used to recover the illumination and initialize a multiview photometric stereo scheme to obtain a closed surface reconstruction. There are two main contributions in this paper: First, we describe a robust technique to estimate light directions and intensities and, second, we introduce a novel formulation of photometric stereo which combines multiple viewpoints and, hence, allows closed surface reconstructions. The algorithm has been implemented as a practical model acquisition system. Here, a quantitative evaluation of the algorithm on synthetic data is presented together with complete reconstructions of challenging real objects. Finally, we show experimentally how, even in the case of highly textured objects, this technique can greatly improve on correspondence-based multiview stereo results.