841 resultados para visual object detection
Resumo:
The brain extracts useful features from a maelstrom of sensory information, and a fundamental goal of theoretical neuroscience is to work out how it does so. One proposed feature extraction strategy is motivated by the observation that the meaning of sensory data, such as the identity of a moving visual object, is often more persistent than the activation of any single sensory receptor. This notion is embodied in the slow feature analysis (SFA) algorithm, which uses “slowness” as an heuristic by which to extract semantic information from multi-dimensional time-series. Here, we develop a probabilistic interpretation of this algorithm showing that inference and learning in the limiting case of a suitable probabilistic model yield exactly the results of SFA. Similar equivalences have proved useful in interpreting and extending comparable algorithms such as independent component analysis. For SFA, we use the equivalent probabilistic model as a conceptual spring-board, with which to motivate several novel extensions to the algorithm.
Resumo:
Several studies have shown that sensory contextual cues can reduce the interference observed during learning of opposing force fields. However, because each study examined a small set of cues, often in a unique paradigm, the relative efficacy of different sensory contextual cues is unclear. In the present study we quantify how seven contextual cues, some investigated previously and some novel, affect the formation and recall of motor memories. Subjects made movements in a velocity-dependent curl field, with direction varying randomly from trial to trial but always associated with a unique contextual cue. Linking field direction to the cursor or background color, or to peripheral visual motion cues, did not reduce interference. In contrast, the orientation of a visual object attached to the hand cursor significantly reduced interference, albeit by a small amount. When the fields were associated with movement in different locations in the workspace, a substantial reduction in interference was observed. We tested whether this reduction in interference was due to the different locations of the visual feedback (targets and cursor) or the movements (proprioceptive). When the fields were associated only with changes in visual display location (movements always made centrally) or only with changes in the movement location (visual feedback always displayed centrally), a substantial reduction in interference was observed. These results show that although some visual cues can lead to the formation and recall of distinct representations in motor memory, changes in spatial visual and proprioceptive states of the movement are far more effective than changes in simple visual contextual cues.
Resumo:
This paper presents the first performance evaluation of interest points on scalar volumetric data. Such data encodes 3D shape, a fundamental property of objects. The use of another such property, texture (i.e. 2D surface colouration), or appearance, for object detection, recognition and registration has been well studied; 3D shape less so. However, the increasing prevalence of 3D shape acquisition techniques and the diminishing returns to be had from appearance alone have seen a surge in 3D shape-based methods. In this work, we investigate the performance of several state of the art interest points detectors in volumetric data, in terms of repeatability, number and nature of interest points. Such methods form the first step in many shape-based applications. Our detailed comparison, with both quantitative and qualitative measures on synthetic and real 3D data, both point-based and volumetric, aids readers in selecting a method suitable for their application. © 2012 Springer Science+Business Media, LLC.
Resumo:
针对目前常见的星形模型等易受部件缺失影响的问题,提出一种空间位置约束模型.该模型以目标中心作为参考点,衡量所有局部特征之间的位置关系,避免了特殊结点缺失带来的不利影响.基于模板中目标的表面特征和形状因素构造空间位置约束模型,利用待检测目标的表面特征信息形成相关假设,通过假设检验定量地衡量目标出现的位置及可能性,并提出了一种搜索目标中心位置的加速算法.实验结果表明,该空间位置约束模型不受部件缺失的影响,性能稳定.
Resumo:
We propose the exploding-reflector method to simulate a monostatic survey with a single simulation. The exploding reflector, used in seismic modeling, is adapted for ground-penetrating radar (GPR) modeling by using the analogy between acoustic and electromagnetic waves. The method can be used with ray tracing to obtain the location of the interfaces and estimate the properties of the medium on the basis of the traveltimes and reflection amplitudes. In particular, these can provide a better estimation of the conductivity and geometrical details. The modeling methodology is complemented with the use of the plane-wave method. The technique is illustrated with GPR data from an excavated tomb of the nineteenth century.
Resumo:
文章讲述了交通监控系统中应用视频图像流来跟踪运动目标并对目标进行分类的具体过程和原则.基于目标检测提出了双差分的目标检测算法,目标分类应用到了连续时间限制和最大可能性估计的原则,目标跟踪则结合检测到的运动目标图像和当前模板进行相关匹配.实验结果表明,该过程能够很好地探测和分类目标,去除背景信息的干扰,并能够在运动目标部分被遮挡、外观改变和运动停止等情况下连续地跟踪目标.
Resumo:
Visual object recognition requires the matching of an image with a set of models stored in memory. In this paper we propose an approach to recognition in which a 3-D object is represented by the linear combination of 2-D images of the object. If M = {M1,...Mk} is the set of pictures representing a given object, and P is the 2-D image of an object to be recognized, then P is considered an instance of M if P = Eki=aiMi for some constants ai. We show that this approach handles correctly rigid 3-D transformations of objects with sharp as well as smooth boundaries, and can also handle non-rigid transformations. The paper is divided into two parts. In the first part we show that the variety of views depicting the same object under different transformations can often be expressed as the linear combinations of a small number of views. In the second part we suggest how this linear combinatino property may be used in the recognition process.
Resumo:
Different approaches to visual object recognition can be divided into two general classes: model-based vs. non model-based schemes. In this paper we establish some limitation on the class of non model-based recognition schemes. We show that every function that is invariant to viewing position of all objects is the trivial (constant) function. It follows that every consistent recognition scheme for recognizing all 3-D objects must in general be model based. The result is extended to recognition schemes that are imperfect (allowed to make mistakes) or restricted to certain classes of objects.
Resumo:
Based on our previous work in deformable shape model-based object detection, a new method is proposed that uses index trees for organizing shape features to support content-based retrieval applications. In the proposed strategy, different shape feature sets can be used in index trees constructed for object detection and shape similarity comparison respectively. There is a direct correspondence between the two shape feature sets. As a result, application-specific features can be obtained efficiently for shape-based retrieval after object detection. A novel approach is proposed that allows retrieval of images based on the population distribution of deformed shapes in each image. Experiments testing these new approaches have been conducted using an image database that contains blood cell micrographs. The precision vs. recall performance measure shows that our method is superior to previous methods.
Resumo:
Object detection can be challenging when the object class exhibits large variations. One commonly-used strategy is to first partition the space of possible object variations and then train separate classifiers for each portion. However, with continuous spaces the partitions tend to be arbitrary since there are no natural boundaries (for example, consider the continuous range of human body poses). In this paper, a new formulation is proposed, where the detectors themselves are associated with continuous parameters, and reside in a parameterized function space. There are two advantages of this strategy. First, a-priori partitioning of the parameter space is not needed; the detectors themselves are in a parameterized space. Second, the underlying parameters for object variations can be learned from training data in an unsupervised manner. In profile face detection experiments, at a fixed false alarm number of 90, our method attains a detection rate of 75% vs. 70% for the method of Viola-Jones. In hand shape detection, at a false positive rate of 0.1%, our method achieves a detection rate of 99.5% vs. 98% for partition based methods. In pedestrian detection, our method reduces the miss detection rate by a factor of three at a false positive rate of 1%, compared with the method of Dalal-Triggs.
Resumo:
Object detection and recognition are important problems in computer vision. The challenges of these problems come from the presence of noise, background clutter, large within class variations of the object class and limited training data. In addition, the computational complexity in the recognition process is also a concern in practice. In this thesis, we propose one approach to handle the problem of detecting an object class that exhibits large within-class variations, and a second approach to speed up the classification processes. In the first approach, we show that foreground-background classification (detection) and within-class classification of the foreground class (pose estimation) can be jointly solved with using a multiplicative form of two kernel functions. One kernel measures similarity for foreground-background classification. The other kernel accounts for latent factors that control within-class variation and implicitly enables feature sharing among foreground training samples. For applications where explicit parameterization of the within-class states is unavailable, a nonparametric formulation of the kernel can be constructed with a proper foreground distance/similarity measure. Detector training is accomplished via standard Support Vector Machine learning. The resulting detectors are tuned to specific variations in the foreground class. They also serve to evaluate hypotheses of the foreground state. When the image masks for foreground objects are provided in training, the detectors can also produce object segmentation. Methods for generating a representative sample set of detectors are proposed that can enable efficient detection and tracking. In addition, because individual detectors verify hypotheses of foreground state, they can also be incorporated in a tracking-by-detection frame work to recover foreground state in image sequences. To run the detectors efficiently at the online stage, an input-sensitive speedup strategy is proposed to select the most relevant detectors quickly. The proposed approach is tested on data sets of human hands, vehicles and human faces. On all data sets, the proposed approach achieves improved detection accuracy over the best competing approaches. In the second part of the thesis, we formulate a filter-and-refine scheme to speed up recognition processes. The binary outputs of the weak classifiers in a boosted detector are used to identify a small number of candidate foreground state hypotheses quickly via Hamming distance or weighted Hamming distance. The approach is evaluated in three applications: face recognition on the face recognition grand challenge version 2 data set, hand shape detection and parameter estimation on a hand data set, and vehicle detection and estimation of the view angle on a multi-pose vehicle data set. On all data sets, our approach is at least five times faster than simply evaluating all foreground state hypotheses with virtually no loss in classification accuracy.
Resumo:
Anterior inferotemporal cortex (ITa) plays a key role in visual object recognition. Recognition is tolerant to object position, size, and view changes, yet recent neurophysiological data show ITa cells with high object selectivity often have low position tolerance, and vice versa. A neural model learns to simulate both this tradeoff and ITa responses to image morphs using large-scale and small-scale IT cells whose population properties may support invariant recognition.
Resumo:
Noise is one of the main factors degrading the quality of original multichannel remote sensing data and its presence influences classification efficiency, object detection, etc. Thus, pre-filtering is often used to remove noise and improve the solving of final tasks of multichannel remote sensing. Recent studies indicate that a classical model of additive noise is not adequate enough for images formed by modern multichannel sensors operating in visible and infrared bands. However, this fact is often ignored by researchers designing noise removal methods and algorithms. Because of this, we focus on the classification of multichannel remote sensing images in the case of signal-dependent noise present in component images. Three approaches to filtering of multichannel images for the considered noise model are analysed, all based on discrete cosine transform in blocks. The study is carried out not only in terms of conventional efficiency metrics used in filtering (MSE) but also in terms of multichannel data classification accuracy (probability of correct classification, confusion matrix). The proposed classification system combines the pre-processing stage where a DCT-based filter processes the blocks of the multichannel remote sensing image and the classification stage. Two modern classifiers are employed, radial basis function neural network and support vector machines. Simulations are carried out for three-channel image of Landsat TM sensor. Different cases of learning are considered: using noise-free samples of the test multichannel image, the noisy multichannel image and the pre-filtered one. It is shown that the use of the pre-filtered image for training produces better classification in comparison to the case of learning for the noisy image. It is demonstrated that the best results for both groups of quantitative criteria are provided if a proposed 3D discrete cosine transform filter equipped by variance stabilizing transform is applied. The classification results obtained for data pre-filtered in different ways are in agreement for both considered classifiers. Comparison of classifier performance is carried out as well. The radial basis neural network classifier is less sensitive to noise in original images, but after pre-filtering the performance of both classifiers is approximately the same.
Resumo:
A new high performance, programmable image processing chip targeted at video and HDTV applications is described. This was initially developed for image small object recognition but has much broader functional application including 1D and 2D FIR filtering as well as neural network computation. The core of the circuit is made up of an array of twenty one multiplication-accumulation cells based on systolic architecture. Devices can be cascaded to increase the order of the filter both vertically and horizontally. The chip has been fabricated in a 0.6 µ, low power CMOS technology and operates on 10 bit input data at over 54 Megasamples per second. The introduction gives some background to the chip design and highlights that there are few other comparable devices. Section 2 gives a brief introduction to small object detection. The chip architecture and the chip design will be described in detail in the later sections.
Resumo:
The Field Programmable Gate Array (FPGA) implementation of the commonly used Histogram of Oriented Gradients (HOG) algorithm is explored. The HOG algorithm is employed to extract features for object detection. A key focus has been to explore the use of a new FPGA-based processor which has been targeted at image processing. The paper gives details of the mapping and scheduling factors that influence the performance and the stages that were undertaken to allow the algorithm to be deployed on FPGA hardware, whilst taking into account the specific IPPro architecture features. We show that multi-core IPPro performance can exceed that of against state-of-the-art FPGA designs by up to 3.2 times with reduced design and implementation effort and increased flexibility all on a low cost, Zynq programmable system.