532 resultados para swd: Image Processing
Resumo:
Person re-identification is particularly challenging due to significant appearance changes across separate camera views. In order to re-identify people, a representative human signature should effectively handle differences in illumination, pose and camera parameters. While general appearance-based methods are modelled in Euclidean spaces, it has been argued that some applications in image and video analysis are better modelled via non-Euclidean manifold geometry. To this end, recent approaches represent images as covariance matrices, and interpret such matrices as points on Riemannian manifolds. As direct classification on such manifolds can be difficult, in this paper we propose to represent each manifold point as a vector of similarities to class representers, via a recently introduced form of Bregman matrix divergence known as the Stein divergence. This is followed by using a discriminative mapping of similarity vectors for final classification. The use of similarity vectors is in contrast to the traditional approach of embedding manifolds into tangent spaces, which can suffer from representing the manifold structure inaccurately. Comparative evaluations on benchmark ETHZ and iLIDS datasets for the person re-identification task show that the proposed approach obtains better performance than recent techniques such as Histogram Plus Epitome, Partial Least Squares, and Symmetry-Driven Accumulation of Local Features.
Resumo:
Facial expression recognition (FER) has been dramatically developed in recent years, thanks to the advancements in related fields, especially machine learning, image processing and human recognition. Accordingly, the impact and potential usage of automatic FER have been growing in a wide range of applications, including human-computer interaction, robot control and driver state surveillance. However, to date, robust recognition of facial expressions from images and videos is still a challenging task due to the difficulty in accurately extracting the useful emotional features. These features are often represented in different forms, such as static, dynamic, point-based geometric or region-based appearance. Facial movement features, which include feature position and shape changes, are generally caused by the movements of facial elements and muscles during the course of emotional expression. The facial elements, especially key elements, will constantly change their positions when subjects are expressing emotions. As a consequence, the same feature in different images usually has different positions. In some cases, the shape of the feature may also be distorted due to the subtle facial muscle movements. Therefore, for any feature representing a certain emotion, the geometric-based position and appearance-based shape normally changes from one image to another image in image databases, as well as in videos. This kind of movement features represents a rich pool of both static and dynamic characteristics of expressions, which playa critical role for FER. The vast majority of the past work on FER does not take the dynamics of facial expressions into account. Some efforts have been made on capturing and utilizing facial movement features, and almost all of them are static based. These efforts try to adopt either geometric features of the tracked facial points, or appearance difference between holistic facial regions in consequent frames or texture and motion changes in loca- facial regions. Although achieved promising results, these approaches often require accurate location and tracking of facial points, which remains problematic.
Resumo:
This paper presents a novel place recognition algorithm inspired by the recent discovery of overlapping and multi-scale spatial maps in the rodent brain. We mimic this hierarchical framework by training arrays of Support Vector Machines to recognize places at multiple spatial scales. Place match hypotheses are then cross-validated across all spatial scales, a process which combines the spatial specificity of the finest spatial map with the consensus provided by broader mapping scales. Experiments on three real-world datasets including a large robotics benchmark demonstrate that mapping over multiple scales uniformly improves place recognition performance over a single scale approach without sacrificing localization accuracy. We present analysis that illustrates how matching over multiple scales leads to better place recognition performance and discuss several promising areas for future investigation.
Resumo:
Brain decoding of functional Magnetic Resonance Imaging data is a pattern analysis task that links brain activity patterns to the experimental conditions. Classifiers predict the neural states from the spatial and temporal pattern of brain activity extracted from multiple voxels in the functional images in a certain period of time. The prediction results offer insight into the nature of neural representations and cognitive mechanisms and the classification accuracy determines our confidence in understanding the relationship between brain activity and stimuli. In this paper, we compared the efficacy of three machine learning algorithms: neural network, support vector machines, and conditional random field to decode the visual stimuli or neural cognitive states from functional Magnetic Resonance data. Leave-one-out cross validation was performed to quantify the generalization accuracy of each algorithm on unseen data. The results indicated support vector machine and conditional random field have comparable performance and the potential of the latter is worthy of further investigation.
Resumo:
Previous behavioral studies reported a robust effect of increased naming latencies when objects to be named were blocked within semantic category, compared to items blocked between category. This semantic context effect has been attributed to various mechanisms including inhibition or excitation of lexico-semantic representations and incremental learning of associations between semantic features and names, and is hypothesized to increase demands on verbal self-monitoring during speech production. Objects within categories also share many visual structural features, introducing a potential confound when interpreting the level at which the context effect might occur. Consistent with previous findings, we report a significant increase in response latencies when naming categorically related objects within blocks, an effect associated with increased perfusion fMRI signal bilaterally in the hippocampus and in the left middle to posterior superior temporal cortex. No perfusion changes were observed in the middle section of the left middle temporal cortex, a region associated with retrieval of lexical-semantic information in previous object naming studies. Although a manipulation of visual feature similarity did not influence naming latencies, we observed perfusion increases in the perirhinal cortex for naming objects with similar visual features that interacted with the semantic context in which objects were named. These results provide support for the view that the semantic context effect in object naming occurs due to an incremental learning mechanism, and involves increased demands on verbal self-monitoring.
Resumo:
Previous studies have found that the lateral posterior fusiform gyri respond more robustly to pictures of animals than pictures of manmade objects and suggested that these regions encode the visual properties characteristic of animals. We suggest that such effects actually reflect processing demands arising when items with similar representations must be finely discriminated. In a positron emission tomography (PET) study of category verification with colored photographs of animals and vehicles, there was robust animal-specific activation in the lateral posterior fusiform gyri when stimuli were categorized at an intermediate level of specificity (e.g., dog or car). However, when the same photographs were categorized at a more specific level (e.g., Labrador or BMW), these regions responded equally strongly to animals and vehicles. We conclude that the lateral posterior fusiform does not encode domain-specific representations of animals or visual properties characteristic of animals. Instead, these regions are strongly activated whenever an item must be discriminated from many close visual or semantic competitors. Apparent category effects arise because, at an intermediate level of specificity, animals have more visual and semantic competitors than do artifacts.
Resumo:
Studies of semantic impairment arising from brain disease suggest that the anterior temporal lobes are critical for semantic abilities in humans; yet activation of these regions is rarely reported in functional imaging studies of healthy controls performing semantic tasks. Here, we combined neuropsychological and PET functional imaging data to show that when healthy subjects identify concepts at a specific level, the regions activated correspond to the site of maximal atrophy in patients with relatively pure semantic impairment. The stimuli were color photographs of common animals or vehicles, and the task was category verification at specific (e.g., robin), intermediate (e.g., bird), or general (e.g., animal) levels. Specific, relative to general, categorization activated the antero-lateral temporal cortices bilaterally, despite matching of these experimental conditions for difficulty. Critically, in patients with atrophy in precisely these areas, the most pronounced deficit was in the retrieval of specific semantic information.
Resumo:
The proliferation of news reports published in online websites and news information sharing among social media users necessitates effective techniques for analysing the image, text and video data related to news topics. This paper presents the first study to classify affective facial images on emerging news topics. The proposed system dynamically monitors and selects the current hot (of great interest) news topics with strong affective interestingness using textual keywords in news articles and social media discussions. Images from the selected hot topics are extracted and classified into three categorized emotions, positive, neutral and negative, based on facial expressions of subjects in the images. Performance evaluations on two facial image datasets collected from real-world resources demonstrate the applicability and effectiveness of the proposed system in affective classification of facial images in news reports. Facial expression shows high consistency with the affective textual content in news reports for positive emotion, while only low correlation has been observed for neutral and negative. The system can be directly used for applications, such as assisting editors in choosing photos with a proper affective semantic for a certain topic during news report preparation.
Resumo:
INTRODUCTION Calculating segmental (vertebral level-by-level) torso masses in Adolescent Idiopathic Scoliosis (AIS) patients allows the gravitational loading on the scoliotic spine during relaxed standing to be estimated. METHODS Existing low dose CT scans were used to calculate vertebral level-by-level torso masses and joint moments occurring in the spine for a group of female AIS patients with right-sided thoracic curves. Image processing software, ImageJ (v1.45 NIH USA) was used to reconstruct the torso segments and subsequently measure the torso volume and mass corresponding to each vertebral level. Body segment masses for the head, neck and arms were taken from published anthropometric data. Intervertebral joint moments at each vertebral level were found by summing each of the torso segment masses above the required joint and multiplying it by the perpendicular distance to the centre of the disc. RESULTS AND DISCUSSION Twenty patients were included in this study with a mean age of 15.0±2.7 years and a mean Cobb angle 52±5.9°. The mean total trunk mass, as a percentage of total body mass, was 27.8 (SD 0.5) %. Mean segmental torso mass increased inferiorly from 0.6kg at T1 to 1.5kg at L5. The coronal plane joint moments during relaxed standing were typically 5-7Nm at the apex of the curve (Figure 1), with the highest apex joint of 7Nm. CT scans were performed in the supine position and curve magnitudes are known to be 7-10° smaller than those measured in standing [1]. Therefore joint moments produced by gravity will be greater than those calculated here. CONCLUSIONS Coronal plane joint moments as high as 7Nm can occur during relaxed standing in scoliosis patients, which may help to explain the mechanics of AIS progression. The body mass distributions calculated in this study can be used to estimate joint moments derived using other imaging modalities such as MRI and subsequently determine if a relationship exists between joint moments and progressive vertebral deformity.
Resumo:
The paper examines the knowledge of pedestrian movements, both in real scenarios, and from more recent years, in the virtual 4 simulation realm. Aiming to verify whether it is possible to learn from the study of virtual environments how people will behave in real 5 environments, it is vital to understand what is already known about behavior in real environments. Besides the walking interaction among 6 pedestrians, the interaction between pedestrians and the built environment in which they are walking also have greatest relevance. Force-based 7 models were compared with the other three major microscopic models of pedestrian simulation to demonstrate a more realistic and capable 8 heuristic approach is needed for the study of the dynamics of pedestrians.
Resumo:
The ability to build high-fidelity 3D representations of the environment from sensor data is critical for autonomous robots. Multi-sensor data fusion allows for more complete and accurate representations. Furthermore, using distinct sensing modalities (i.e. sensors using a different physical process and/or operating at different electromagnetic frequencies) usually leads to more reliable perception, especially in challenging environments, as modalities may complement each other. However, they may react differently to certain materials or environmental conditions, leading to catastrophic fusion. In this paper, we propose a new method to reliably fuse data from multiple sensing modalities, including in situations where they detect different targets. We first compute distinct continuous surface representations for each sensing modality, with uncertainty, using Gaussian Process Implicit Surfaces (GPIS). Second, we perform a local consistency test between these representations, to separate consistent data (i.e. data corresponding to the detection of the same target by the sensors) from inconsistent data. The consistent data can then be fused together, using another GPIS process, and the rest of the data can be combined as appropriate. The approach is first validated using synthetic data. We then demonstrate its benefit using a mobile robot, equipped with a laser scanner and a radar, which operates in an outdoor environment in the presence of large clouds of airborne dust and smoke.
Resumo:
Outdoor robots such as planetary rovers must be able to navigate safely and reliably in order to successfully perform missions in remote or hostile environments. Mobility prediction is critical to achieving this goal due to the inherent control uncertainty faced by robots traversing natural terrain. We propose a novel algorithm for stochastic mobility prediction based on multi-output Gaussian process regression. Our algorithm considers the correlation between heading and distance uncertainty and provides a predictive model that can easily be exploited by motion planning algorithms. We evaluate our method experimentally and report results from over 30 trials in a Mars-analogue environment that demonstrate the effectiveness of our method and illustrate the importance of mobility prediction in navigating challenging terrain.
Resumo:
This paper presents a full system demonstration of dynamic sensorbased reconfiguration of a networked robot team. Robots sense obstacles in their environment locally and dynamically adapt their global geometric configuration to conform to an abstract goal shape. We present a novel two-layer planning and control algorithm for team reconfiguration that is decentralised and assumes local (neighbour-to-neighbour) communication only. The approach is designed to be resource-efficient and we show experiments using a team of nine mobile robots with modest computation, communication, and sensing. The robots use acoustic beacons for localisation and can sense obstacles in their local neighbourhood using IR sensors. Our results demonstrate globally-specified reconfiguration from local information in a real robot network, and highlight limitations of standard mesh networks in implementing decentralised algorithms.
Contrast transfer function correction applied to cryo-electron tomography and sub-tomogram averaging
Resumo:
Cryo-electron tomography together with averaging of sub-tomograms containing identical particles can reveal the structure of proteins or protein complexes in their native environment. The resolution of this technique is limited by the contrast transfer function (CTF) of the microscope. The CTF is not routinely corrected in cryo-electron tomography because of difficulties including CTF detection, due to the low signal to noise ratio, and CTF correction, since images are characterised by a spatially variant CTF. Here we simulate the effects of the CTF on the resolution of the final reconstruction, before and after CTF correction, and consider the effect of errors and approximations in defocus determination. We show that errors in defocus determination are well tolerated when correcting a series of tomograms collected at a range of defocus values. We apply methods for determining the CTF parameters in low signal to noise images of tilted specimens, for monitoring defocus changes using observed magnification changes, and for correcting the CTF prior to reconstruction. Using bacteriophage PRDI as a test sample, we demonstrate that this approach gives an improvement in the structure obtained by sub-tomogram averaging from cryo-electron tomograms.
Resumo:
Real-world environments such as houses and offices change over time, meaning that a mobile robot’s map will become out of date. In this work, we introduce a method to update the reference views in a hybrid metrictopological map so that a mobile robot can continue to localize itself in a changing environment. The updating mechanism, based on the multi-store model of human memory, incorporates a spherical metric representation of the observed visual features for each node in the map, which enables the robot to estimate its heading and navigate using multi-view geometry, as well as representing the local 3D geometry of the environment. A series of experiments demonstrate the persistence performance of the proposed system in real changing environments, including analysis of the long-term stability.