26 resultados para Automatic Image Annotation


Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present results on an extension to our approach for automatic sports video annotation. Sports video is augmented with accelerometer data from wrist bands worn by umpires in the game. We solve the problem of automatic segmentation and robust gesture classification using a hierarchical hidden Markov model in conjunction with a filler model. The hierarchical model allows us to consider gestures at different levels of abstraction and the filler model allows us to handle extraneous umpire movements. Results are presented for labeling video for a game of Cricket.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes an application of camera motion estimation to index cricket games. The shots are labeled with the type of shot: glance left, glance right, left drive, right drive, left cut, right pull and straight drive. The method has the advantages that it is fast and avoids complex image segmentation. The classification of the cricket shots is done using an incremental learning algorithm. We tested the method on over 600 shots and the results show that the system has a classification accuracy of 74%.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper details research that will explore the analysis of human behaviour via video surveillance. Digital computer images will be obtained from video footage of a real world scene, and positions of people in the scene will be identified and tracked through each frame in the sequence.

The noted positions will build into a pattern of motion that can be examined and classified. It is proposed that specific events, such as panic or fight situations, will have unique, and therefore identifying, characteristics that will enable automatic detection of such events.

It is envisaged that active cameras will be used when a situation of interest occurs, to enable more information to be extracted from the scene (e.g., panning to follow action, or zooming to enhance detail.)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The self-quotient image is a biologically inspired representation which has been proposed as an illumination invariant feature for automatic face recognition. Owing to the lack of strong domain specific assumptions underlying this representation, it can be readily extracted from raw images irrespective of the persons's pose, facial expression etc. What makes the self-quotient image additionally attractive is that it can be computed quickly and in a closed form using simple low-level image operations. However, it is generally accepted that the self-quotient is insufficiently robust to large illumination changes which is why it is mainly used in applications in which low precision is an acceptable compromise for high recall (e.g. retrieval systems). Yet, in this paper we demonstrate that the performance of this representation in challenging illuminations has been greatly underestimated. We show that its error rate can be reduced by over an order of magnitude, without any changes to the representation itself. Rather, we focus on the manner in which the dissimilarity between two self-quotient images is computed. By modelling the dominant sources of noise affecting the representation, we propose and evaluate a series of different dissimilarity measures, the best of which reduces the initial error rate of 63.0% down to only 5.7% on the notoriously challenging YaleB data set.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this chapter we described a novel framework for automatic face recognition in the presence of varying illumination, primarily applicable to matching face sets or sequences. The framework is based on simple image processing filters that compete with unprocessed greyscale input to yield a single matching score between individuals. By performing all numerically consuming computation offline, our method both (i) retains the matching efficiency of simple image filters, but (ii) with a greatly increased robustness, as all online processing is performed in closed-form. Evaluated on a large, real-world data corpus, the proposed framework was shown to be successful in video-based recognition across a wide range of illumination, pose and face motion pattern changes

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this work is to recognize all the frontal faces of a character in the closed world of a movie or situation comedy, given a small number of query faces. This is challenging because faces in a feature-length film are relatively uncontrolled with a wide variability of scale, pose, illumination, and expressions, and also may be partially occluded. We develop a recognition method based on a cascade of processing steps that normalize for the effects of the changing imaging environment. In particular there are three areas of novelty: (i) we suppress the background surrounding the face, enabling the maximum area of the face to be retained for recognition rather than a subset; (ii) we include a pose refinement step to optimize the registration between the test image and face exemplar; and (iii) we use robust distance to a sub-space to allow for partial occlusion and expression change. The method is applied and evaluated on several feature length films. It is demonstrated that high recall rates (over 92%) can be achieved whilst maintaining good precision (over 93%).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In many automatic face recognition applications, a set of a person's face images is available rather than a single image. In this paper, we describe a novel method for face recognition using image sets. We propose a flexible, semi-parametric model for learning probability densities confined to highly non-linear but intrinsically low-dimensional manifolds. The model leads to a statistical formulation of the recognition problem in terms of minimizing the divergence between densities estimated on these manifolds. The proposed method is evaluated on a large data set, acquired in realistic imaging conditions with severe illumination variation. Our algorithm is shown to match the best and outperform other state-of-the-art algorithms in the literature, achieving 94% recognition rate on average.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Illumination invariance remains the most researched, yet the most challenging aspect of automatic face recognition. In this paper we propose a novel, general recognition framework for efficient matching of individual face images, sets or sequences. The framework is based on simple image processing filters that compete with unprocessed greyscale input to yield a single matching score between individuals. It is shown how the discrepancy between illumination conditions between novel input and the training data set can be estimated and used to weigh the contribution of two competing representations. We describe an extensive empirical evaluation of the proposed method on 171 individuals and over 1300 video sequences with extreme illumination, pose and head motion variation. On this challenging data set our algorithm consistently demonstrated a dramatic performance improvement over traditional filtering approaches. We demonstrate a reduction of 50-75% in recognition error rates, the best performing method-filter combination correctly recognizing 96% of the individuals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we address the problems of fully automatic localization and segmentation of 3D vertebral bodies from CT/MR images. We propose a learning-based, unified random forest regression and classification framework to tackle these two problems. More specifically, in the first stage, the localization of 3D vertebral bodies is solved with random forest regression where we aggregate the votes from a set of randomly sampled image patches to get a probability map of the center of a target vertebral body in a given image. The resultant probability map is then further regularized by Hidden Markov Model (HMM) to eliminate potential ambiguity caused by the neighboring vertebral bodies. The output from the first stage allows us to define a region of interest (ROI) for the segmentation step, where we use random forest classification to estimate the likelihood of a voxel in the ROI being foreground or background. The estimated likelihood is combined with the prior probability, which is learned from a set of training data, to get the posterior probability of the voxel. The segmentation of the target vertebral body is then done by a binary thresholding of the estimated probability. We evaluated the present approach on two openly available datasets: 1) 3D T2-weighted spine MR images from 23 patients and 2) 3D spine CT images from 10 patients. Taking manual segmentation as the ground truth (each MR image contains at least 7 vertebral bodies from T11 to L5 and each CT image contains 5 vertebral bodies from L1 to L5), we evaluated the present approach with leave-one-out experiments. Specifically, for the T2-weighted MR images, we achieved for localization a mean error of 1.6 mm, and for segmentation a mean Dice metric of 88.7% and a mean surface distance of 1.5 mm, respectively. For the CT images we achieved for localization a mean error of 1.9 mm, and for segmentation a mean Dice metric of 91.0% and a mean surface distance of 0.9 mm, respectively.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, a novel approach is proposed to automatically generate both watercolor painting and pencil sketch drawing, or binary image of contour, from realism-style photo by using DBSCAN color clustering based on HSV color space. While the color clusters produced by proposed methods help to create watercolor painting, the noise pixels are useful to generate the pencil sketch drawing. Moreover, noise pixels are reassigned to color clusters by a novel algorithm to refine the contour in the watercolor painting. The main goal of this paper is to inspire non-professional artists' imagination to produce traditional style painting easily by only adjusting a few parameters. Also, another contribution of this paper is to propose an easy method to produce the binary image of contour, which is a vice product when mining image data by DBSCAN clustering. Thus the binary image is useful in resource limited system to reduce data but keep enough information of images. © 2007 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper reports robustness comparison of clustering-based multi-label classification methods versus nonclustering counterparts for multi-concept associated image and video annotations. In the experimental setting of this paper, we adopted six popular multi-label classification Algorithms, two different base classifiers for problem transformation based multilabel classifications, and three different clustering algorithms for pre-clustering of the training data. We conducted experimental evaluation on two multi-label benchmark datasets: scene image data and mediamill video data. We also employed two multi-label classification evaluation metrics, namely, micro F1-measure and Hamming-loss to present the predictive performance of the classifications. The results reveal that different base classifiers and clustering methods contribute differently to the performance of the multi-label classifications. Overall, the pre-clustering methods improve the effectiveness of multi-label classifications in certain experimental settings. This provides vital information to users when deciding which multi-label classification method to choose for multiple-concept associated image and video annotations.