534 resultados para Stereo image processing


Relevância:

80.00% 80.00%

Publicador:

Resumo:

A robust visual tracking system requires an object appearance model that is able to handle occlusion, pose, and illumination variations in the video stream. This can be difficult to accomplish when the model is trained using only a single image. In this paper, we first propose a tracking approach based on affine subspaces (constructed from several images) which are able to accommodate the abovementioned variations. We use affine subspaces not only to represent the object, but also the candidate areas that the object may occupy. We furthermore propose a novel approach to measure affine subspace-to-subspace distance via the use of non-Euclidean geometry of Grassmann manifolds. The tracking problem is then considered as an inference task in a Markov Chain Monte Carlo framework via particle filtering. Quantitative evaluation on challenging video sequences indicates that the proposed approach obtains considerably better performance than several recent state-of-the-art methods such as Tracking-Learning-Detection and MILtrack.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a novel approach to video summarisation that makes use of a Bag-of-visual-Textures (BoT) approach. Two systems are proposed, one based solely on the BoT approach and another which exploits both colour information and BoT features. On 50 short-term videos from the Open Video Project we show that our BoT and fusion systems both achieve state-of-the-art performance, obtaining an average F-measure of 0.83 and 0.86 respectively, a relative improvement of 9% and 13% when compared to the previous state-of-the-art. When applied to a new underwater surveillance dataset containing 33 long-term videos, the proposed system reduces the amount of footage by a factor of 27, with only minor degradation in the information content. This order of magnitude reduction in video data represents significant savings in terms of time and potential labour cost when manually reviewing such footage.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes a novel system for automatic classification of images obtained from Anti-Nuclear Antibody (ANA) pathology tests on Human Epithelial type 2 (HEp-2) cells using the Indirect Immunofluorescence (IIF) protocol. The IIF protocol on HEp-2 cells has been the hallmark method to identify the presence of ANAs, due to its high sensitivity and the large range of antigens that can be detected. However, it suffers from numerous shortcomings, such as being subjective as well as time and labour intensive. Computer Aided Diagnostic (CAD) systems have been developed to address these problems, which automatically classify a HEp-2 cell image into one of its known patterns (eg. speckled, homogeneous). Most of the existing CAD systems use handpicked features to represent a HEp-2 cell image, which may only work in limited scenarios. We propose a novel automatic cell image classification method termed Cell Pyramid Matching (CPM), which is comprised of regional histograms of visual words coupled with the Multiple Kernel Learning framework. We present a study of several variations of generating histograms and show the efficacy of the system on two publicly available datasets: the ICPR HEp-2 cell classification contest dataset and the SNPHEp-2 dataset.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recent advances in computer vision and machine learning suggest that a wide range of problems can be addressed more appropriately by considering non-Euclidean geometry. In this paper we explore sparse dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping, which enables us to devise a closed-form solution for updating a Grassmann dictionary, atom by atom. Furthermore, to handle non-linearity in data, we propose a kernelised version of the dictionary learning algorithm. Experiments on several classification tasks (face recognition, action recognition, dynamic texture classification) show that the proposed approach achieves considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelised Affine Hull Method and graph-embedding Grassmann discriminant analysis.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Person re-identification is particularly challenging due to significant appearance changes across separate camera views. In order to re-identify people, a representative human signature should effectively handle differences in illumination, pose and camera parameters. While general appearance-based methods are modelled in Euclidean spaces, it has been argued that some applications in image and video analysis are better modelled via non-Euclidean manifold geometry. To this end, recent approaches represent images as covariance matrices, and interpret such matrices as points on Riemannian manifolds. As direct classification on such manifolds can be difficult, in this paper we propose to represent each manifold point as a vector of similarities to class representers, via a recently introduced form of Bregman matrix divergence known as the Stein divergence. This is followed by using a discriminative mapping of similarity vectors for final classification. The use of similarity vectors is in contrast to the traditional approach of embedding manifolds into tangent spaces, which can suffer from representing the manifold structure inaccurately. Comparative evaluations on benchmark ETHZ and iLIDS datasets for the person re-identification task show that the proposed approach obtains better performance than recent techniques such as Histogram Plus Epitome, Partial Least Squares, and Symmetry-Driven Accumulation of Local Features.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Facial expression recognition (FER) has been dramatically developed in recent years, thanks to the advancements in related fields, especially machine learning, image processing and human recognition. Accordingly, the impact and potential usage of automatic FER have been growing in a wide range of applications, including human-computer interaction, robot control and driver state surveillance. However, to date, robust recognition of facial expressions from images and videos is still a challenging task due to the difficulty in accurately extracting the useful emotional features. These features are often represented in different forms, such as static, dynamic, point-based geometric or region-based appearance. Facial movement features, which include feature position and shape changes, are generally caused by the movements of facial elements and muscles during the course of emotional expression. The facial elements, especially key elements, will constantly change their positions when subjects are expressing emotions. As a consequence, the same feature in different images usually has different positions. In some cases, the shape of the feature may also be distorted due to the subtle facial muscle movements. Therefore, for any feature representing a certain emotion, the geometric-based position and appearance-based shape normally changes from one image to another image in image databases, as well as in videos. This kind of movement features represents a rich pool of both static and dynamic characteristics of expressions, which playa critical role for FER. The vast majority of the past work on FER does not take the dynamics of facial expressions into account. Some efforts have been made on capturing and utilizing facial movement features, and almost all of them are static based. These efforts try to adopt either geometric features of the tracked facial points, or appearance difference between holistic facial regions in consequent frames or texture and motion changes in loca- facial regions. Although achieved promising results, these approaches often require accurate location and tracking of facial points, which remains problematic.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a novel place recognition algorithm inspired by the recent discovery of overlapping and multi-scale spatial maps in the rodent brain. We mimic this hierarchical framework by training arrays of Support Vector Machines to recognize places at multiple spatial scales. Place match hypotheses are then cross-validated across all spatial scales, a process which combines the spatial specificity of the finest spatial map with the consensus provided by broader mapping scales. Experiments on three real-world datasets including a large robotics benchmark demonstrate that mapping over multiple scales uniformly improves place recognition performance over a single scale approach without sacrificing localization accuracy. We present analysis that illustrates how matching over multiple scales leads to better place recognition performance and discuss several promising areas for future investigation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Brain decoding of functional Magnetic Resonance Imaging data is a pattern analysis task that links brain activity patterns to the experimental conditions. Classifiers predict the neural states from the spatial and temporal pattern of brain activity extracted from multiple voxels in the functional images in a certain period of time. The prediction results offer insight into the nature of neural representations and cognitive mechanisms and the classification accuracy determines our confidence in understanding the relationship between brain activity and stimuli. In this paper, we compared the efficacy of three machine learning algorithms: neural network, support vector machines, and conditional random field to decode the visual stimuli or neural cognitive states from functional Magnetic Resonance data. Leave-one-out cross validation was performed to quantify the generalization accuracy of each algorithm on unseen data. The results indicated support vector machine and conditional random field have comparable performance and the potential of the latter is worthy of further investigation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Previous behavioral studies reported a robust effect of increased naming latencies when objects to be named were blocked within semantic category, compared to items blocked between category. This semantic context effect has been attributed to various mechanisms including inhibition or excitation of lexico-semantic representations and incremental learning of associations between semantic features and names, and is hypothesized to increase demands on verbal self-monitoring during speech production. Objects within categories also share many visual structural features, introducing a potential confound when interpreting the level at which the context effect might occur. Consistent with previous findings, we report a significant increase in response latencies when naming categorically related objects within blocks, an effect associated with increased perfusion fMRI signal bilaterally in the hippocampus and in the left middle to posterior superior temporal cortex. No perfusion changes were observed in the middle section of the left middle temporal cortex, a region associated with retrieval of lexical-semantic information in previous object naming studies. Although a manipulation of visual feature similarity did not influence naming latencies, we observed perfusion increases in the perirhinal cortex for naming objects with similar visual features that interacted with the semantic context in which objects were named. These results provide support for the view that the semantic context effect in object naming occurs due to an incremental learning mechanism, and involves increased demands on verbal self-monitoring.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Previous studies have found that the lateral posterior fusiform gyri respond more robustly to pictures of animals than pictures of manmade objects and suggested that these regions encode the visual properties characteristic of animals. We suggest that such effects actually reflect processing demands arising when items with similar representations must be finely discriminated. In a positron emission tomography (PET) study of category verification with colored photographs of animals and vehicles, there was robust animal-specific activation in the lateral posterior fusiform gyri when stimuli were categorized at an intermediate level of specificity (e.g., dog or car). However, when the same photographs were categorized at a more specific level (e.g., Labrador or BMW), these regions responded equally strongly to animals and vehicles. We conclude that the lateral posterior fusiform does not encode domain-specific representations of animals or visual properties characteristic of animals. Instead, these regions are strongly activated whenever an item must be discriminated from many close visual or semantic competitors. Apparent category effects arise because, at an intermediate level of specificity, animals have more visual and semantic competitors than do artifacts.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Studies of semantic impairment arising from brain disease suggest that the anterior temporal lobes are critical for semantic abilities in humans; yet activation of these regions is rarely reported in functional imaging studies of healthy controls performing semantic tasks. Here, we combined neuropsychological and PET functional imaging data to show that when healthy subjects identify concepts at a specific level, the regions activated correspond to the site of maximal atrophy in patients with relatively pure semantic impairment. The stimuli were color photographs of common animals or vehicles, and the task was category verification at specific (e.g., robin), intermediate (e.g., bird), or general (e.g., animal) levels. Specific, relative to general, categorization activated the antero-lateral temporal cortices bilaterally, despite matching of these experimental conditions for difficulty. Critically, in patients with atrophy in precisely these areas, the most pronounced deficit was in the retrieval of specific semantic information.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The proliferation of news reports published in online websites and news information sharing among social media users necessitates effective techniques for analysing the image, text and video data related to news topics. This paper presents the first study to classify affective facial images on emerging news topics. The proposed system dynamically monitors and selects the current hot (of great interest) news topics with strong affective interestingness using textual keywords in news articles and social media discussions. Images from the selected hot topics are extracted and classified into three categorized emotions, positive, neutral and negative, based on facial expressions of subjects in the images. Performance evaluations on two facial image datasets collected from real-world resources demonstrate the applicability and effectiveness of the proposed system in affective classification of facial images in news reports. Facial expression shows high consistency with the affective textual content in news reports for positive emotion, while only low correlation has been observed for neutral and negative. The system can be directly used for applications, such as assisting editors in choosing photos with a proper affective semantic for a certain topic during news report preparation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

INTRODUCTION Calculating segmental (vertebral level-by-level) torso masses in Adolescent Idiopathic Scoliosis (AIS) patients allows the gravitational loading on the scoliotic spine during relaxed standing to be estimated. METHODS Existing low dose CT scans were used to calculate vertebral level-by-level torso masses and joint moments occurring in the spine for a group of female AIS patients with right-sided thoracic curves. Image processing software, ImageJ (v1.45 NIH USA) was used to reconstruct the torso segments and subsequently measure the torso volume and mass corresponding to each vertebral level. Body segment masses for the head, neck and arms were taken from published anthropometric data. Intervertebral joint moments at each vertebral level were found by summing each of the torso segment masses above the required joint and multiplying it by the perpendicular distance to the centre of the disc. RESULTS AND DISCUSSION Twenty patients were included in this study with a mean age of 15.0±2.7 years and a mean Cobb angle 52±5.9°. The mean total trunk mass, as a percentage of total body mass, was 27.8 (SD 0.5) %. Mean segmental torso mass increased inferiorly from 0.6kg at T1 to 1.5kg at L5. The coronal plane joint moments during relaxed standing were typically 5-7Nm at the apex of the curve (Figure 1), with the highest apex joint of 7Nm. CT scans were performed in the supine position and curve magnitudes are known to be 7-10° smaller than those measured in standing [1]. Therefore joint moments produced by gravity will be greater than those calculated here. CONCLUSIONS Coronal plane joint moments as high as 7Nm can occur during relaxed standing in scoliosis patients, which may help to explain the mechanics of AIS progression. The body mass distributions calculated in this study can be used to estimate joint moments derived using other imaging modalities such as MRI and subsequently determine if a relationship exists between joint moments and progressive vertebral deformity.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The paper examines the knowledge of pedestrian movements, both in real scenarios, and from more recent years, in the virtual 4 simulation realm. Aiming to verify whether it is possible to learn from the study of virtual environments how people will behave in real 5 environments, it is vital to understand what is already known about behavior in real environments. Besides the walking interaction among 6 pedestrians, the interaction between pedestrians and the built environment in which they are walking also have greatest relevance. Force-based 7 models were compared with the other three major microscopic models of pedestrian simulation to demonstrate a more realistic and capable 8 heuristic approach is needed for the study of the dynamics of pedestrians.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The ability to build high-fidelity 3D representations of the environment from sensor data is critical for autonomous robots. Multi-sensor data fusion allows for more complete and accurate representations. Furthermore, using distinct sensing modalities (i.e. sensors using a different physical process and/or operating at different electromagnetic frequencies) usually leads to more reliable perception, especially in challenging environments, as modalities may complement each other. However, they may react differently to certain materials or environmental conditions, leading to catastrophic fusion. In this paper, we propose a new method to reliably fuse data from multiple sensing modalities, including in situations where they detect different targets. We first compute distinct continuous surface representations for each sensing modality, with uncertainty, using Gaussian Process Implicit Surfaces (GPIS). Second, we perform a local consistency test between these representations, to separate consistent data (i.e. data corresponding to the detection of the same target by the sensors) from inconsistent data. The consistent data can then be fused together, using another GPIS process, and the rest of the data can be combined as appropriate. The approach is first validated using synthetic data. We then demonstrate its benefit using a mobile robot, equipped with a laser scanner and a radar, which operates in an outdoor environment in the presence of large clouds of airborne dust and smoke.