989 resultados para Computer Vision Android


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sparse representation has been introduced to address many recognition problems in computer vision. In this paper, we propose a new framework for object categorization based on sparse representation of local features. Unlike most of previous sparse coding based methods in object classification that only use sparse coding to extract high-level features, the proposed method incorporates sparse representation and classification into a unified framework. Therefore, it does not need a further classifier. Experimental results show that the proposed method achieved better or comparable accuracy than the well known bag-of-features representation with various classifiers.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Finding the skeleton of a 3D mesh is an essential task for many applications such as mesh animation, tracking, and 3D registeration. In recent years, new technologies in computer vision such as Microsoft Kinect have proven that a mesh skeleton can be useful such as in the case of human machine interactions. To calculate the 3D mesh skeleton, the mesh properties such as topology and its components relations are utilized. In this paper, we propose the usage of a novel algorithm that can efficiently calculate a vertex antipodal point. A vertex antipodal point is the diametrically opposite point that belongs to the same mesh. The set of centers of the connecting lines between each vertex and its antipodal point represents the 3D mesh desired skeleton. Post processing is completed for smoothing and fitting centers into optimized skeleton parts. The algorithm is tested on different classes of 3D objects and produced efficient results that are comparable with the literature. The algorithm has the advantages of producing high quality skeletons as it preserves details. This is suitable for applications where the mesh skeleton mapping is required to be kept as much as possible.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With urbanization and vehicle availability, there exist many traffic problems including congestion, environmental impact and safety. In order to address these problems, we propose a video driven traffic modelling system in this paper. The system can simulate real-world traffic activities in a computer, based on traffic data recorded in videos. Video processing is employed to estimate metrics such as traffic volumes. These metrics are used to update the traffic system model, which is then simulated using the Paramics™ traffic simulation platform. Video driven traffic modelling has widespread potential application in traffic systems, due to the convenience and reduced costs of model development and maintenance. Experiments are conducted in this paper to demonstrate the effectiveness of the proposed system.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose Video Driven Traffic Modelling (VDTM) for accurate simulation of real-world traffic behaviours with detailed information and low-cost model development and maintenance. Computer vision techniques are employed to estimate traffic parameters. These parameters are used to build and update a traffic system model. The model is simulated using the Paramics traffic simulation platform. Based on the simulation techniques, effects of traffic interventions can be evaluated in order to achieve better decision makings for traffic management authorities. In this paper, traffic parameters such as vehicle types, times of starting trips and corresponding origin-destinations are extracted from a video. A road network is manually defined according to the traffic composition in the video, and individual vehicles associated with extracted properties are modelled and simulated within the defined road network using Paramics. VDTM has widespread potential applications in supporting traffic decision-makings. To demonstrate the effectiveness, we apply it in optimizing a traffic signal control system, which adaptively adjusts green times of signals at an intersection to reduce traffic congestion.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Automatic face recognition (AFR) is an area with immense practical potential which includes a wide range of commercial and law enforcement applications, and it continues to be one of the most active research areas of computer vision. Even after over three decades of intense research, the state-of-the-art in AFR continues to improve, benefiting from advances in a range of different fields including image processing, pattern recognition, computer graphics and physiology. However, systems based on visible spectrum images continue to face challenges in the presence of illumination, pose and expression changes, as well as facial disguises, all of which can significantly decrease their accuracy. Amongst various approaches which have been proposed in an attempt to overcome these limitations, the use of infrared (IR) imaging has emerged as a particularly promising research direction. This paper presents a comprehensive and timely review of the literature on this subject.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Linear subspace representations of appearance variation are pervasive in computer vision. In this paper we address the problem of robustly matching them (computing the similarity between them) when they correspond to sets of images of different (possibly greatly so) scales. We show that the naïve solution of projecting the low-scale subspace into the high-scale image space is inadequate, especially at large scale discrepancies. A successful approach is proposed instead. It consists of (i) an interpolated projection of the low-scale subspace into the high-scale space, which is followed by (ii) a rotation of this initial estimate within the bounds of the imposed “downsampling constraint”. The optimal rotation is found in the closed-form which best aligns the high-scale reconstruction of the low-scale subspace with the reference it is compared to. The proposed method is evaluated on the problem of matching sets of face appearances under varying illumination. In comparison to the naïve matching, our algorithm is shown to greatly increase the separation of between-class and within-class similarities, as well as produce far more meaningful modes of common appearance on which the match score is based.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Image reduction is a crucial task in image processing, underpinning many practical applications. This work proposes novel image reduction operators based on non-monotonic averaging aggregation functions. The technique of penalty function minimisation is used to derive a novel mode-like estimator capable of identifying the most appropriate pixel value for representing a subset of the original image. Performance of this aggregation function and several traditional robust estimators of location are objectively assessed by applying image reduction within a facial recognition task. The FERET evaluation protocol is applied to confirm that these non-monotonic functions are able to sustain task performance compared to recognition using nonreduced images, as well as significantly improve performance on query images corrupted by noise. These results extend the state of the art in image reduction based on aggregation functions and provide a basis for efficiency and accuracy improvements in practical computer vision applications.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents the comparison between the Microsoft Kinect depth sensor and the Asus Xtion for computer vision applications. Depth sensors, known as RGBD cameras,
project an infrared pattern and calculate the depth from the reflected light using an infrared sensitive camera. In this research, we compare the depth sensing capabilities of the two sensors under various conditions. The purpose is to give the reader a background to whether use the Microsoft Kinect or Asus Xtion sensor to solve a specific computer vision problem. The properties of the two depth sensors were investigated by conducting a series of experiments evaluating the accuracy of the sensors under various conditions, which shows the advantages and disadvantages of both Microsoft Kinect and Asus Xtion sensor.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Texture classification is one of the most important tasks in computer vision field and it has been extensively investigated in the last several decades. Previous texture classification methods mainly used the template matching based methods such as Support Vector Machine and k-Nearest-Neighbour for classification. Given enough training images the state-of-the-art texture classification methods could achieve very high classification accuracies on some benchmark databases. However, when the number of training images is limited, which usually happens in real-world applications because of the high cost of obtaining labelled data, the classification accuracies of those state-of-the-art methods would deteriorate due to the overfitting effect. In this paper we aim to develop a novel framework that could correctly classify textural images with only a small number of training images. By taking into account the repetition and sparsity property of textures we propose a sparse representation based multi-manifold analysis framework for texture classification from few training images. A set of new training samples are generated from each training image by a scale and spatial pyramid, and then the training samples belonging to each class are modelled by a manifold based on sparse representation. We learn a dictionary of sparse representation and a projection matrix for each class and classify the test images based on the projected reconstruction errors. The framework provides a more compact model than the template matching based texture classification methods, and mitigates the overfitting effect. Experimental results show that the proposed method could achieve reasonably high generalization capability even with as few as 3 training images, and significantly outperforms the state-of-the-art texture classification approaches on three benchmark datasets. © 2014 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a comparison of applying different clustering algorithms on a point cloud constructed from the depth maps captured by a RGBD camera such as Microsoft Kinect. The depth sensor is capable of returning images, where each pixel represents the distance to its corresponding point not the RGB data. This is considered as the real novelty of the RGBD camera in computer vision compared to the common video-based and stereo-based products. Depth sensors captures depth data without using markers, 2D to 3D-transition or determining feature points. The captured depth map then cluster the 3D depth points into different clusters to determine the different limbs of the human-body. The 3D points clustering is achieved by different clustering techniques. Our Experiments show good performance and results in using clustering to determine different human-body limbs.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Linear subspace representations of appearance variation are pervasive in computer vision. This paper addresses the problem of robustly matching such subspaces (computing the similarity between them) when they are used to describe the scope of variations within sets of images of different (possibly greatly so) scales. A naïve solution of projecting the low-scale subspace into the high-scale image space is described first and subsequently shown to be inadequate, especially at large scale discrepancies. A successful approach is proposed instead. It consists of (i) an interpolated projection of the low-scale subspace into the high-scale space, which is followed by (ii) a rotation of this initial estimate within the bounds of the imposed "downsampling constraint". The optimal rotation is found in the closed-form which best aligns the high-scale reconstruction of the low-scale subspace with the reference it is compared to. The method is evaluated on the problem of matching sets of (i) face appearances under varying illumination and (ii) object appearances under varying viewpoint, using two large data sets. In comparison to the naïve matching, the proposed algorithm is shown to greatly increase the separation of between-class and within-class similarities, as well as produce far more meaningful modes of common appearance on which the match score is based. © 2014 Elsevier Ltd.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Automatic face recognition is an area with immense practical potential which includes a wide range of commercial and law enforcement applications. Hence it is unsurprising that it continues to be one of the most active research areas of computer vision. Even after over three decades of intense research, the state-of-the-art in face recognition continues to improve, benefitting from advances in a range of different research fields such as image processing, pattern recognition, computer graphics, and physiology. Systems based on visible spectrum images, the most researched face recognition modality, have reached a significant level of maturity with some practical success. However, they continue to face challenges in the presence of illumination, pose and expression changes, as well as facial disguises, all of which can significantly decrease recognition accuracy. Amongst various approaches which have been proposed in an attempt to overcome these limitations, the use of infrared (IR) imaging has emerged as a particularly promising research direction. This paper presents a comprehensive and timely review of the literature on this subject. Our key contributions are (i) a summary of the inherent properties of infrared imaging which makes this modality promising in the context of face recognition; (ii) a systematic review of the most influential approaches, with a focus on emerging common trends as well as key differences between alternative methodologies; (iii) a description of the main databases of infrared facial images available to the researcher; and lastly (iv) a discussion of the most promising avenues for future research. © 2014 Elsevier Ltd.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

 Scale features are useful for a great number of applications in computer vision. However, it is difficult to tolerate diversities of features in natural scenes by parametric methods. Empirical studies show that object frequencies and segment sizes follow the power law distributions which are well generated by Pitman-Yor (PY) processes. Based on mid-level segments, we propose a hierarchical sequence of images to obtain scale information stored in a hierarchical structure through the hierarchical Pitman-Yor (HPY) model which is expected to tolerate uncertainty of natural images. We also evaluate our representation by the application of segmentation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Object segmentation is widely recognized as one of the most challenging problems in computer vision. One major problem of existing methods is that most of them are vulnerable to the cluttered background. Moreover, human intervention is often required to specify foreground/background priors, which restricts the usage of object segmentation in real-world scenario. To address these problems, we propose a novel approach to learn complementary saliency priors for foreground object segmentation in complex scenes. Different from existing saliency-based segmentation approaches, we propose to learn two complementary saliency maps that reveal the most reliable foreground and background regions. Given such priors, foreground object segmentation is formulated as a binary pixel labelling problem that can be efficiently solved using graph cuts. As such, the confident saliency priors can be utilized to extract the most salient objects and reduce the distraction of cluttered background. Extensive experiments show that our approach outperforms 16 state-of-the-art methods remarkably on three public image benchmarks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Identifying the parameters of a model such that it best fits an observed set of data points is fundamental to the majority of problems in computer vision. This task is particularly demanding when portions of the data has been corrupted by gross outliers, measurements that are not explained by the assumed distributions. In this paper we present a novel method that uses the Least Quantile of Squares (LQS) estimator, a well known but computationally demanding high-breakdown estimator with several appealing theoretical properties. The proposed method is a meta-algorithm, based on the well established principles of proximal splitting, that allows for the use of LQS estimators while still retaining computational efficiency. Implementing the method is straight-forward as the majority of the resulting sub-problems can be solved using existing standard bundle-adjustment packages. Preliminary experiments on synthetic and real image data demonstrate the impressive practical performance of our method as compared to existing robust estimators used in computer vision.