978 resultados para IMAGE SEQUENCES
Resumo:
本文提出了一项新技术,它可以实现从任意的图像序列中自动提炼出简洁的表达方式,以便进行高效的视觉通信.我们认为,视觉通信的全过程可分为视频数据的传输和人眼对视觉信号的理解两个阶段.因此,本文以心理学中人对图像的认知规律的相火理论为指导,专注于研究如何同时提高图像的可压缩性和可理解性.我们借助一个缘提取算法来保留对人的视觉系统最为敏感的物体边界,再用一个非线性扩散算法减弱无足轻重的细节信号.为了使最终生成的动画保持时间上的一致性,本文的技术方案是在骼个时空域上没计的.而我们依然能够保持实时的处理速度,因为该方法可以方便地使用GPU作并行计算.为了演示新技术的实用性,我们还建立了一个以本文算法作为处理内核的完整的视觉通信系统并在该系统进行所有实验.统计数据表明,本文方法不仅可以明显地降低传输带宽,而且提高了图像序列的可理解性.
Resumo:
A new ocean wave and sea surface current monitoring system with horizontally-(HH) and vertically-(VV) polarized X-band radar was developed. Two experiments into the use of the radar system were carried out at two sites, respectively, for calibration process in Zhangzi Island of the Yellow Sea, and for validation in the Yellow Sea and South China Sea. Ocean wave parameters and sea surface current velocities were retrieved from the dual polarized radar image sequences based on an inverse method. The results obtained from dual-polarized radar data sets acquired in Zhangzi Island are compared with those from an ocean directional buoy. The results show that ocean wave parameters and sea surface current velocities retrieved from radar image sets are in a good agreement with those observed by the buoy. In particular, it has been found that the vertically-polarized radar is better than the horizontally-polarized radar in retrieving ocean wave parameters, especially in detecting the significant wave height below 1.0 m.
Resumo:
本文针对显微视觉的图像恢复与3D 重建问题,从显微光学成像的离焦机理和基于点扩散函数的图像模糊化描述出发,通过模糊测度算子分析序列显微图像的离焦分布规律,提出了一种用于构建较为精准离焦模型的方法。该模型采用混合参数多项式结构,与传统高斯模型相比,可以更接近真实离焦过程,这为较为精确的光学显微图像恢复和3D 重构提供了新的技术途径。
Resumo:
传统的火灾检测方法一般采用感烟、感温、感光探测器等进行探测。本文提出了一种嵌入式基于图像视觉特征的火灾检测方法,以TI公司的数字多媒体处理器TMS320DM642为核心,设计实现智能前端火灾探测与自动报警系统。通过DM642对视频图像进行采集并结合相应的智能图像处理与模式识别算法,对森林火险进行实时监控。实验结果表明,该系统比传统系统更进一步减少了误报率且具有响应快、监控范围广等优点。
Resumo:
In the first part of this paper we show that a new technique exploiting 1D correlation of 2D or even 1D patches between successive frames may be sufficient to compute a satisfactory estimation of the optical flow field. The algorithm is well-suited to VLSI implementations. The sparse measurements provided by the technique can be used to compute qualitative properties of the flow for a number of different visual tsks. In particular, the second part of the paper shows how to combine our 1D correlation technique with a scheme for detecting expansion or rotation ([5]) in a simple algorithm which also suggests interesting biological implications. The algorithm provides a rough estimate of time-to-crash. It was tested on real image sequences. We show its performance and compare the results to previous approaches.
Resumo:
We provide a theory of the three-dimensional interpretation of a class of line-drawings called p-images, which are interpreted by the human vision system as parallelepipeds ("boxes"). Despite their simplicity, p-images raise a number of interesting vision questions: *Why are p-images seen as three-dimensional objects? Why not just as flatimages? *What are the dimensions and pose of the perceived objects? *Why are some p-images interpreted as rectangular boxes, while others are seen as skewed, even though there is no obvious distinction between the images? *When p-images are rotated in three dimensions, why are the image-sequences perceived as distorting objects---even though structure-from-motion would predict that rigid objects would be seen? *Why are some three-dimensional parallelepipeds seen as radically different when viewed from different viewpoints? We show that these and related questions can be answered with the help of a single mathematical result and an associated perceptual principle. An interesting special case arises when there are right angles in the p-image. This case represents a singularity in the equations and is mystifying from the vision point of view. It would seem that (at least in this case) the vision system does not follow the ordinary rules of geometry but operates in accordance with other (and as yet unknown) principles.
Resumo:
We describe a new method for motion estimation and 3D reconstruction from stereo image sequences obtained by a stereo rig moving through a rigid world. We show that given two stereo pairs one can compute the motion of the stereo rig directly from the image derivatives (spatial and temporal). Correspondences are not required. One can then use the images from both pairs combined to compute a dense depth map. The motion estimates between stereo pairs enable us to combine depth maps from all the pairs in the sequence to form an extended scene reconstruction and we show results from a real image sequence. The motion computation is a linear least squares computation using all the pixels in the image. Areas with little or no contrast are implicitly weighted less so one does not have to explicitly apply a confidence measure.
Resumo:
Object detection and recognition are important problems in computer vision. The challenges of these problems come from the presence of noise, background clutter, large within class variations of the object class and limited training data. In addition, the computational complexity in the recognition process is also a concern in practice. In this thesis, we propose one approach to handle the problem of detecting an object class that exhibits large within-class variations, and a second approach to speed up the classification processes. In the first approach, we show that foreground-background classification (detection) and within-class classification of the foreground class (pose estimation) can be jointly solved with using a multiplicative form of two kernel functions. One kernel measures similarity for foreground-background classification. The other kernel accounts for latent factors that control within-class variation and implicitly enables feature sharing among foreground training samples. For applications where explicit parameterization of the within-class states is unavailable, a nonparametric formulation of the kernel can be constructed with a proper foreground distance/similarity measure. Detector training is accomplished via standard Support Vector Machine learning. The resulting detectors are tuned to specific variations in the foreground class. They also serve to evaluate hypotheses of the foreground state. When the image masks for foreground objects are provided in training, the detectors can also produce object segmentation. Methods for generating a representative sample set of detectors are proposed that can enable efficient detection and tracking. In addition, because individual detectors verify hypotheses of foreground state, they can also be incorporated in a tracking-by-detection frame work to recover foreground state in image sequences. To run the detectors efficiently at the online stage, an input-sensitive speedup strategy is proposed to select the most relevant detectors quickly. The proposed approach is tested on data sets of human hands, vehicles and human faces. On all data sets, the proposed approach achieves improved detection accuracy over the best competing approaches. In the second part of the thesis, we formulate a filter-and-refine scheme to speed up recognition processes. The binary outputs of the weak classifiers in a boosted detector are used to identify a small number of candidate foreground state hypotheses quickly via Hamming distance or weighted Hamming distance. The approach is evaluated in three applications: face recognition on the face recognition grand challenge version 2 data set, hand shape detection and parameter estimation on a hand data set, and vehicle detection and estimation of the view angle on a multi-pose vehicle data set. On all data sets, our approach is at least five times faster than simply evaluating all foreground state hypotheses with virtually no loss in classification accuracy.
Resumo:
Log-polar image architectures, motivated by the structure of the human visual field, have long been investigated in computer vision for use in estimating motion parameters from an optical flow vector field. Practical problems with this approach have been: (i) dependence on assumed alignment of the visual and motion axes; (ii) sensitivity to occlusion form moving and stationary objects in the central visual field, where much of the numerical sensitivity is concentrated; and (iii) inaccuracy of the log-polar architecture (which is an approximation to the central 20°) for wide-field biological vision. In the present paper, we show that an algorithm based on generalization of the log-polar architecture; termed the log-dipolar sensor, provides a large improvement in performance relative to the usual log-polar sampling. Specifically, our algorithm: (i) is tolerant of large misalignmnet of the optical and motion axes; (ii) is insensitive to significant occlusion by objects of unknown motion; and (iii) represents a more correct analogy to the wide-field structure of human vision. Using the Helmholtz-Hodge decomposition to estimate the optical flow vector field on a log-dipolar sensor, we demonstrate these advantages, using synthetic optical flow maps as well as natural image sequences.
Resumo:
In this paper we present a new method for simultaneously determining three dimensional (3-D) shape and motion of a non-rigid object from uncalibrated two dimensional (2- D) images without assuming the distribution characteristics. A non-rigid motion can be treated as a combination of a rigid rotation and a non-rigid deformation. To seek accurate recovery of deformable structures, we estimate the probability distribution function of the corresponding features through random sampling, incorporating an established probabilistic model. The fitting between the observation and the projection of the estimated 3-D structure will be evaluated using a Markov chain Monte Carlo based expectation maximisation algorithm. Applications of the proposed method to both synthetic and real image sequences are demonstrated with promising results.
Resumo:
Aims. We use high spatial and temporal resolution observations from the Swedish Solar Telescope to study the chromospheric velocities of a C-class flare originating from active region NOAA 10969.
Methods. A time-distance analysis is employed to estimate directional velocity components in Hα and Ca ii K image sequences. Also, imaging spectroscopy has allowed us to determine flare-induced line-of-sight velocities. A wavelet analysis is used to analyse the periodic nature of associated flare bursts.
Results. Time-distance analysis reveals velocities as high as 64 km s-1 along the flare ribbon and 15 km s-1 perpendicular to it. The velocities are very similar in both the Hα and Ca ii K time series. Line-of-sight Hα velocities are red-shifted with values up to 17 km s-1. The high spatial and temporal resolution of the observations have allowed us to detect velocities significantly higher than those found in earlier studies. Flare bursts with a periodicity of ≈60 s are also detected. These bursts are similar to the quasi-periodic oscillations observed at hard X-ray and radio wavelength data.
Conclusions. Some of the highest velocities detected in the solar atmosphere are presented. Line-of-sight velocity maps show considerable mixing of both the magnitude and direction of velocities along the flare path. A change in direction of the velocities at the flare kernel has also been detected which may be a signature of chromospheric evaporation.
Resumo:
abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. Two types of experiments were performed: a) distinguishing visually between real and synthetic image- sequences of the same utterances, ("Turing tests") and b) gauging visual speech recognition by comparing lip-reading performance of the real and synthetic image-sequences of the same utterances ("Intelligibility tests"). Subjects that were presented randomly with either real or synthetic image-sequences could not tell the synthetic from the real sequences above chance level. The same subjects when asked to lip-read the utterances from the same image-sequences recognized speech from real image-sequences significantly better than from synthetic ones. However, performance for both, real and synthetic, were at levels suggested in the literature on lip-reading. We conclude from the two experiments that the animation of Mary 101 is adequate for providing a percept of a talking head. However, additional effort is required to improve the animation for lip-reading purposes like rehabilitation and language learning. In addition, these two tasks could be considered as explicit and implicit perceptual discrimination tasks. In the explicit task (a), each stimulus is classified directly as a synthetic or real image-sequence by detecting a possible difference between the synthetic and the real image-sequences. The implicit perceptual discrimination task (b) consists of a comparison between visual recognition of speech of real and synthetic image-sequences. Our results suggest that implicit perceptual discrimination is a more sensitive method for discrimination between synthetic and real image-sequences than explicit perceptual discrimination.
Resumo:
A major obstacle to processing images of the ocean floor comes from the absorption and scattering effects of the light in the aquatic environment. Due to the absorption of the natural light, underwater vehicles often require artificial light sources attached to them to provide the adequate illumination. Unfortunately, these flashlights tend to illuminate the scene in a nonuniform fashion, and, as the vehicle moves, induce shadows in the scene. For this reason, the first step towards application of standard computer vision techniques to underwater imaging requires dealing first with these lighting problems. This paper analyses and compares existing methodologies to deal with low-contrast, nonuniform illumination in underwater image sequences. The reviewed techniques include: (i) study of the illumination-reflectance model, (ii) local histogram equalization, (iii) homomorphic filtering, and, (iv) subtraction of the illumination field. Several experiments on real data have been conducted to compare the different approaches
Resumo:
In this paper we report the degree of reliability of image sequences taken by off-the-shelf TV cameras for modeling camera rotation and reconstructing 3D structure using computer vision techniques. This is done in spite of the fact that computer vision systems usually use imaging devices that are specifically designed for the human vision. Our scenario consists of a static scene and a mobile camera moving through the scene. The scene is any long axial building dominated by features along the three principal orientations and with at least one wall containing prominent repetitive planar features such as doors, windows bricks etc. The camera is an ordinary commercial camcorder moving along the axial axis of the scene and is allowed to rotate freely within the range +/- 10 degrees in all directions. This makes it possible that the camera be held by a walking unprofessional cameraman with normal gait, or to be mounted on a mobile robot. The system has been tested successfully on sequence of images of a variety of structured, but fairly cluttered scenes taken by different walking cameramen. The potential application areas of the system include medicine, robotics and photogrammetry.
Resumo:
This paper presents an enhanced hypothesis verification strategy for 3D object recognition. A new learning methodology is presented which integrates the traditional dichotomic object-centred and appearance-based representations in computer vision giving improved hypothesis verification under iconic matching. The "appearance" of a 3D object is learnt using an eigenspace representation obtained as it is tracked through a scene. The feature representation implicitly models the background and the objects observed enabling the segmentation of the objects from the background. The method is shown to enhance model-based tracking, particularly in the presence of clutter and occlusion, and to provide a basis for identification. The unified approach is discussed in the context of the traffic surveillance domain. The approach is demonstrated on real-world image sequences and compared to previous (edge-based) iconic evaluation techniques.