71 resultados para 280208 Computer Vision


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Image reduction is a crucial task in image processing, underpinning many practical applications. This work proposes novel image reduction operators based on non-monotonic averaging aggregation functions. The technique of penalty function minimisation is used to derive a novel mode-like estimator capable of identifying the most appropriate pixel value for representing a subset of the original image. Performance of this aggregation function and several traditional robust estimators of location are objectively assessed by applying image reduction within a facial recognition task. The FERET evaluation protocol is applied to confirm that these non-monotonic functions are able to sustain task performance compared to recognition using nonreduced images, as well as significantly improve performance on query images corrupted by noise. These results extend the state of the art in image reduction based on aggregation functions and provide a basis for efficiency and accuracy improvements in practical computer vision applications.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents the comparison between the Microsoft Kinect depth sensor and the Asus Xtion for computer vision applications. Depth sensors, known as RGBD cameras,
project an infrared pattern and calculate the depth from the reflected light using an infrared sensitive camera. In this research, we compare the depth sensing capabilities of the two sensors under various conditions. The purpose is to give the reader a background to whether use the Microsoft Kinect or Asus Xtion sensor to solve a specific computer vision problem. The properties of the two depth sensors were investigated by conducting a series of experiments evaluating the accuracy of the sensors under various conditions, which shows the advantages and disadvantages of both Microsoft Kinect and Asus Xtion sensor.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Texture classification is one of the most important tasks in computer vision field and it has been extensively investigated in the last several decades. Previous texture classification methods mainly used the template matching based methods such as Support Vector Machine and k-Nearest-Neighbour for classification. Given enough training images the state-of-the-art texture classification methods could achieve very high classification accuracies on some benchmark databases. However, when the number of training images is limited, which usually happens in real-world applications because of the high cost of obtaining labelled data, the classification accuracies of those state-of-the-art methods would deteriorate due to the overfitting effect. In this paper we aim to develop a novel framework that could correctly classify textural images with only a small number of training images. By taking into account the repetition and sparsity property of textures we propose a sparse representation based multi-manifold analysis framework for texture classification from few training images. A set of new training samples are generated from each training image by a scale and spatial pyramid, and then the training samples belonging to each class are modelled by a manifold based on sparse representation. We learn a dictionary of sparse representation and a projection matrix for each class and classify the test images based on the projected reconstruction errors. The framework provides a more compact model than the template matching based texture classification methods, and mitigates the overfitting effect. Experimental results show that the proposed method could achieve reasonably high generalization capability even with as few as 3 training images, and significantly outperforms the state-of-the-art texture classification approaches on three benchmark datasets. © 2014 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a comparison of applying different clustering algorithms on a point cloud constructed from the depth maps captured by a RGBD camera such as Microsoft Kinect. The depth sensor is capable of returning images, where each pixel represents the distance to its corresponding point not the RGB data. This is considered as the real novelty of the RGBD camera in computer vision compared to the common video-based and stereo-based products. Depth sensors captures depth data without using markers, 2D to 3D-transition or determining feature points. The captured depth map then cluster the 3D depth points into different clusters to determine the different limbs of the human-body. The 3D points clustering is achieved by different clustering techniques. Our Experiments show good performance and results in using clustering to determine different human-body limbs.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Linear subspace representations of appearance variation are pervasive in computer vision. This paper addresses the problem of robustly matching such subspaces (computing the similarity between them) when they are used to describe the scope of variations within sets of images of different (possibly greatly so) scales. A naïve solution of projecting the low-scale subspace into the high-scale image space is described first and subsequently shown to be inadequate, especially at large scale discrepancies. A successful approach is proposed instead. It consists of (i) an interpolated projection of the low-scale subspace into the high-scale space, which is followed by (ii) a rotation of this initial estimate within the bounds of the imposed "downsampling constraint". The optimal rotation is found in the closed-form which best aligns the high-scale reconstruction of the low-scale subspace with the reference it is compared to. The method is evaluated on the problem of matching sets of (i) face appearances under varying illumination and (ii) object appearances under varying viewpoint, using two large data sets. In comparison to the naïve matching, the proposed algorithm is shown to greatly increase the separation of between-class and within-class similarities, as well as produce far more meaningful modes of common appearance on which the match score is based. © 2014 Elsevier Ltd.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Automatic face recognition is an area with immense practical potential which includes a wide range of commercial and law enforcement applications. Hence it is unsurprising that it continues to be one of the most active research areas of computer vision. Even after over three decades of intense research, the state-of-the-art in face recognition continues to improve, benefitting from advances in a range of different research fields such as image processing, pattern recognition, computer graphics, and physiology. Systems based on visible spectrum images, the most researched face recognition modality, have reached a significant level of maturity with some practical success. However, they continue to face challenges in the presence of illumination, pose and expression changes, as well as facial disguises, all of which can significantly decrease recognition accuracy. Amongst various approaches which have been proposed in an attempt to overcome these limitations, the use of infrared (IR) imaging has emerged as a particularly promising research direction. This paper presents a comprehensive and timely review of the literature on this subject. Our key contributions are (i) a summary of the inherent properties of infrared imaging which makes this modality promising in the context of face recognition; (ii) a systematic review of the most influential approaches, with a focus on emerging common trends as well as key differences between alternative methodologies; (iii) a description of the main databases of infrared facial images available to the researcher; and lastly (iv) a discussion of the most promising avenues for future research. © 2014 Elsevier Ltd.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

 Scale features are useful for a great number of applications in computer vision. However, it is difficult to tolerate diversities of features in natural scenes by parametric methods. Empirical studies show that object frequencies and segment sizes follow the power law distributions which are well generated by Pitman-Yor (PY) processes. Based on mid-level segments, we propose a hierarchical sequence of images to obtain scale information stored in a hierarchical structure through the hierarchical Pitman-Yor (HPY) model which is expected to tolerate uncertainty of natural images. We also evaluate our representation by the application of segmentation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Object segmentation is widely recognized as one of the most challenging problems in computer vision. One major problem of existing methods is that most of them are vulnerable to the cluttered background. Moreover, human intervention is often required to specify foreground/background priors, which restricts the usage of object segmentation in real-world scenario. To address these problems, we propose a novel approach to learn complementary saliency priors for foreground object segmentation in complex scenes. Different from existing saliency-based segmentation approaches, we propose to learn two complementary saliency maps that reveal the most reliable foreground and background regions. Given such priors, foreground object segmentation is formulated as a binary pixel labelling problem that can be efficiently solved using graph cuts. As such, the confident saliency priors can be utilized to extract the most salient objects and reduce the distraction of cluttered background. Extensive experiments show that our approach outperforms 16 state-of-the-art methods remarkably on three public image benchmarks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Identifying the parameters of a model such that it best fits an observed set of data points is fundamental to the majority of problems in computer vision. This task is particularly demanding when portions of the data has been corrupted by gross outliers, measurements that are not explained by the assumed distributions. In this paper we present a novel method that uses the Least Quantile of Squares (LQS) estimator, a well known but computationally demanding high-breakdown estimator with several appealing theoretical properties. The proposed method is a meta-algorithm, based on the well established principles of proximal splitting, that allows for the use of LQS estimators while still retaining computational efficiency. Implementing the method is straight-forward as the majority of the resulting sub-problems can be solved using existing standard bundle-adjustment packages. Preliminary experiments on synthetic and real image data demonstrate the impressive practical performance of our method as compared to existing robust estimators used in computer vision.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dynamically changing background (dynamic background) still presents a great challenge to many motion-based video surveillance systems. In the context of event detection, it is a major source of false alarms. There is a strong need from the security industry either to detect and suppress these false alarms, or dampen the effects of background changes, so as to increase the sensitivity to meaningful events of interest. In this paper, we restrict our focus to one of the most common causes of dynamic background changes: 1) that of swaying tree branches and 2) their shadows under windy conditions. Considering the ultimate goal in a video analytics pipeline, we formulate a new dynamic background detection problem as a signal processing alternative to the previously described but unreliable computer vision-based approaches. Within this new framework, we directly reduce the number of false alarms by testing if the detected events are due to characteristic background motions. In addition, we introduce a new data set suitable for the evaluation of dynamic background detection. It consists of real-world events detected by a commercial surveillance system from two static surveillance cameras. The research question we address is whether dynamic background can be detected reliably and efficiently using simple motion features and in the presence of similar but meaningful events, such as loitering. Inspired by the tree aerodynamics theory, we propose a novel method named local variation persistence (LVP), that captures the key characteristics of swaying motions. The method is posed as a convex optimization problem, whose variable is the local variation. We derive a computationally efficient algorithm for solving the optimization problem, the solution of which is then used to form a powerful detection statistic. On our newly collected data set, we demonstrate that the proposed LVP achieves excellent detection results and outperforms the best alternative adapted from existing art in the dynamic background literature.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Many vision problems deal with high-dimensional data, such as motion segmentation and face clustering. However, these high-dimensional data usually lie in a low-dimensional structure. Sparse representation is a powerful principle for solving a number of clustering problems with high-dimensional data. This principle is motivated from an ideal modeling of data points according to linear algebra theory. However, real data in computer vision are unlikely to follow the ideal model perfectly. In this paper, we exploit the mixed norm regularization for sparse subspace clustering. This regularization term is a convex combination of the l1norm, which promotes sparsity at the individual level and the block norm l2/1 which promotes group sparsity. Combining these powerful regularization terms will provide a more accurate modeling, subsequently leading to a better solution for the affinity matrix used in sparse subspace clustering. This could help us achieve better performance on motion segmentation and face clustering problems. This formulation also caters for different types of data corruptions. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other state-of-arts on both motion segmentation and face clustering.