946 resultados para Dynamic texture recognition
Resumo:
We present results of a study into the performance of a variety of different image transform-based feature types for speaker-independent visual speech recognition of isolated digits. This includes the first reported use of features extracted using a discrete curvelet transform. The study will show a comparison of some methods for selecting features of each feature type and show the relative benefits of both static and dynamic visual features. The performance of the features will be tested on both clean video data and also video data corrupted in a variety of ways to assess each feature type's robustness to potential real-world conditions. One of the test conditions involves a novel form of video corruption we call jitter which simulates camera and/or head movement during recording.
Resumo:
Rhodopsin, the light sensitive receptor responsible for blue-green vision, serves as a prototypical G protein-coupled receptor (GPCR). Upon light absorption, it undergoes a series of conformational changes that lead to the active form, metarhodopsin II (META II), initiating a signaling cascade through binding to the G protein transducin (G(t)). Here, we first develop a structural model of META II by applying experimental distance restraints to the structure of lumi-rhodopsin (LUMI), an earlier intermediate. The restraints are imposed by using a combination of biased molecular dynamics simulations and perturbations to an elastic network model. We characterize the motions of the transmembrane helices in the LUMI-to-META II transition and the rearrangement of interhelical hydrogen bonds. We then simulate rhodopsin activation in a dynamic model to study the path leading from LUMI to our META II model for wild-type rhodopsin and a series of mutants. The simulations show a strong correlation between the transition dynamics and the pharmacological phenotypes of the mutants. These results help identify the molecular mechanisms of activation in both wild type and mutant rhodopsin. While static models can provide insights into the mechanisms of ligand recognition and predict ligand affinity, a dynamic model of activation could be applicable to study the pharmacology of other GPCRs and their ligands, offering a key to predictions of basal activity and ligand efficacy.
Resumo:
This paper introduces a new technique for palmprint recognition based on Fisher Linear Discriminant Analysis (FLDA) and Gabor filter bank. This method involves convolving a palmprint image with a bank of Gabor filters at different scales and rotations for robust palmprint features extraction. Once these features are extracted, FLDA is applied for dimensionality reduction and class separability. Since the palmprint features are derived from the principal lines, wrinkles and texture along the palm area. One should carefully consider this fact when selecting the appropriate palm region for the feature extraction process in order to enhance recognition accuracy. To address this problem, an improved region of interest (ROI) extraction algorithm is introduced. This algorithm allows for an efficient extraction of the whole palm area by ignoring all the undesirable parts, such as the fingers and background. Experiments have shown that the proposed method yields attractive performances as evidenced by an Equal Error Rate (EER) of 0.03%.
Resumo:
Background: Studies of cross-cultural variations in the perception of emotion have typically compared rates of recognition of static posed stimulus photographs. That research has provided evidence for universality in the recognition of a range of emotions but also for some systematic cross-cultural variation in the interpretation of emotional expression. However, questions remain about how widely such findings can be generalised to real life emotional situations. The present study provides the first evidence that the previously reported interplay between universal and cultural influences extends to ratings of natural, dynamic emotional stimuli.
Methodology/Principal Findings: Participants from Northern Ireland, Serbia, Guatemala and Peru used a computer based tool to continuously rate the strength of positive and negative emotion being displayed in twelve short video sequences by people from the United Kingdom engaged in emotional conversations. Generalized additive mixed models were developed to assess the differences in perception of emotion between countries and sexes. Our results indicate that the temporal pattern of ratings is similar across cultures for a range of emotions and social contexts. However, there are systematic differences in intensity ratings between the countries, with participants from Northern Ireland making the most extreme ratings in the majority of the clips.
Conclusions/Significance: The results indicate that there is strong agreement across cultures in the valence and patterns of ratings of natural emotional situations but that participants from different cultures show systematic variation in the intensity with which they rate emotion. Results are discussed in terms of both ‘in-group advantage’ and ‘display rules’ approaches. This study indicates that examples of natural spontaneous emotional behaviour can be used to study cross-cultural variations in the perception of emotion.
Resumo:
In this paper, a novel framework for dense pixel matching based on dynamic programming is introduced. Unlike most techniques proposed in the literature, our approach assumes neither known camera geometry nor the availability of rectified images. Under such conditions, the matching task cannot be reduced to finding correspondences between a pair of scanlines. We propose to extend existing dynamic programming methodologies to a larger dimensional space by using a 3D scoring matrix so that correspondences between a line and a whole image can be calculated. After assessing our framework on a standard evaluation dataset of rectified stereo images, experiments are conducted on unrectified and non-linearly distorted images. Results validate our new approach and reveal the versatility of our algorithm.
Resumo:
Automatic gender classification has many security and commercial applications. Various modalities have been investigated for gender classification with face-based classification being the most popular. In some real-world scenarios the face may be partially occluded. In these circumstances a classification based on individual parts of the face known as local features must be adopted. We investigate gender classification using lip movements. We show for the first time that important gender specific information can be obtained from the way in which a person moves their lips during speech. Furthermore our study indicates that the lip dynamics during speech provide greater gender discriminative information than simply lip appearance. We also show that the lip dynamics and appearance contain complementary gender information such that a model which captures both traits gives the highest overall classification result. We use Discrete Cosine Transform based features and Gaussian Mixture Modelling to model lip appearance and dynamics and employ the XM2VTS database for our experiments. Our experiments show that a model which captures lip dynamics along with appearance can improve gender classification rates by between 16-21% compared to models of only lip appearance.
Resumo:
This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.
Resumo:
In recent years, there has been a move towards the development of indirect structural health monitoring (SHM)techniques for bridges; the low-cost vibration-based method presented in this paper is such an approach. It consists of the use of a moving vehicle fitted with accelerometers on its axles and incorporates wavelet analysis and statistical pattern recognition. The aim of the approach is to both detect and locate damage in bridges while reducing the need for direct instrumentation of the bridge. In theoretical simulations, a simplified vehicle-bridge interaction model is used to investigate the effectiveness of the approach in detecting damage in a bridge from vehicle accelerations. For this purpose, the accelerations are processed using a continuous wavelet transform as when the axle passes over a damaged section, any discontinuity in the signal would affect the wavelet coefficients. Based on these coefficients, a damage indicator is formulated which can distinguish between different damage levels. However, it is found to be difficult to quantify damage of varying levels when the vehicle’s transverse position is varied between bridge crossings. In a real bridge field experiment, damage was applied artificially to a steel truss bridge to test the effectiveness of the indirect approach in practice; for this purpose a two-axle van was driven across the bridge at constant speed. Both bridge and vehicle acceleration measurements were recorded. The dynamic properties of the test vehicle were identified initially via free vibration tests. It was found that the resulting damage indicators for the bridge and vehicle showed similar patterns, however, it was difficult to distinguish between different artificial damage scenarios.
Resumo:
One of the most widely used techniques in computer vision for foreground detection is to model each background pixel as a Mixture of Gaussians (MoG). While this is effective for a static camera with a fixed or a slowly varying background, it fails to handle any fast, dynamic movement in the background. In this paper, we propose a generalised framework, called region-based MoG (RMoG), that takes into consideration neighbouring pixels while generating the model of the observed scene. The model equations are derived from Expectation Maximisation theory for batch mode, and stochastic approximation is used for online mode updates. We evaluate our region-based approach against ten sequences containing dynamic backgrounds, and show that the region-based approach provides a performance improvement over the traditional single pixel MoG. For feature and region sizes that are equal, the effect of increasing the learning rate is to reduce both true and false positives. Comparison with four state-of-the art approaches shows that RMoG outperforms the others in reducing false positives whilst still maintaining reasonable foreground definition. Lastly, using the ChangeDetection (CDNet 2014) benchmark, we evaluated RMoG against numerous surveillance scenes and found it to amongst the leading performers for dynamic background scenes, whilst providing comparable performance for other commonly occurring surveillance scenes.
Resumo:
PatchCity is a new approach to the procedural generation of city models. The algorithm uses texture synthesis to create a city layout in the visual style of one or more input examples. Data is provided in vector graphic form from either real or synthetic city definitions. The paper describes the PatchCity algorithm, illustrates its use, and identifies its strengths and limitations. The technique provides a greater range of features and styles of city layout than existing generative methods, thereby achieving results that are more realistic. An open source implementation of the algorithm is available.