902 resultados para Rastreamento de features naturais
Resumo:
This paper is concerned with choosing image features for image based visual servo control and how this choice influences the closed-loop dynamics of the system. In prior work, image features tend to be chosen on the basis of image processing simplicity and noise sensitivity. In this paper we show that the choice of feature directly influences the closed-loop dynamics in task-space. We focus on the depth axis control of a visual servo system and compare analytically various approaches that have been reported recently in the literature. The theoretical predictions are verified by experiment.
Resumo:
Gabor representations have been widely used in facial analysis (face recognition, face detection and facial expression detection) due to their biological relevance and computational properties. Two popular Gabor representations used in literature are: 1) Log-Gabor and 2) Gabor energy filters. Even though these representations are somewhat similar, they also have distinct differences as the Log-Gabor filters mimic the simple cells in the visual cortex while the Gabor energy filters emulate the complex cells, which causes subtle differences in the responses. In this paper, we analyze the difference between these two Gabor representations and quantify these differences on the task of facial action unit (AU) detection. In our experiments conducted on the Cohn-Kanade dataset, we report an average area underneath the ROC curve (A`) of 92.60% across 17 AUs for the Gabor energy filters, while the Log-Gabor representation achieved an average A` of 96.11%. This result suggests that small spatial differences that the Log-Gabor filters pick up on are more useful for AU detection than the differences in contours and edges that the Gabor energy filters extract.
Resumo:
The detection of voice activity is a challenging problem, especially when the level of acoustic noise is high. Most current approaches only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to overcome this is to use the visual modality. The current state-of-the-art visual feature extraction technique is one that uses a cascade of visual features (i.e. 2D-DCT, feature mean normalisation, interstep LDA). In this paper, we investigate the effectiveness of this technique for the task of visual voice activity detection (VAD), and analyse each stage of the cascade and quantify the relative improvement in performance gained by each successive stage. The experiments were conducted on the CUAVE database and our results highlight that the dynamics of the visual modality can be used to good effect to improve visual voice activity detection performance.
Resumo:
Personality factors implicated in alcohol misuse have been extensively investigated in adult populations. Fewer studies have clarified the robustness of personality dimensions in predicting early onset alcohol misuse in adolescence. The aim of this study was to examine the predictive utility of two prominent models of personality (Cloninger, 1987; Eysenck & Eysenck, 1975) in emergent alcohol misuse in adolescence. One hundred and 92 secondary school students (mean age = 13.8 years, SD = 0.5) were administered measures of personality (Revised Junior Eysenck Personality Questionnaire – abbreviated; Temperament scale of Junior Temperament and Character Inventory) and drinking behavior (quantity and frequency of consumption, Alcohol Use Disorders Identification Test) at Time 1. At 12-month follow-up, 170 students (88.5%) were retained. Hierarchical multiple regressions revealed the dimensions of psychoticism, extraversion, and Novelty-Seeking to be the most powerful predictors of future alcohol misuse in adolescents. Results provide support for the etiological relevance of these dimensions in the development of early onset alcohol misuse. Findings can be used to develop early intervention programs that target personality risk factors for alcohol misuse in high-risk youth.
Resumo:
This paper considers the question of designing a fully image-based visual servo control for a class of dynamic systems. The work is motivated by the ongoing development of image-based visual servo control of small aerial robotic vehicles. The kinematics and dynamics of a rigid-body dynamical system (such as a vehicle airframe) maneuvering over a flat target plane with observable features are expressed in terms of an unnormalized spherical centroid and an optic flow measurement. The image-plane dynamics with respect to force input are dependent on the height of the camera above the target plane. This dependence is compensated by introducing virtual height dynamics and adaptive estimation in the proposed control. A fully nonlinear adaptive control design is provided that ensures asymptotic stability of the closed-loop system for all feasible initial conditions. The choice of control gains is based on an analysis of the asymptotic dynamics of the system. Results from a realistic simulation are presented that demonstrate the performance of the closed-loop system. To the author's knowledge, this paper documents the first time that an image-based visual servo control has been proposed for a dynamic system using vision measurement for both position and velocity.
Resumo:
Compared to people with a high socioeconomic status, those with a lower socioeconomic status are more likely to perceive their neighbourhood as unattractive and unsafe, which is associated with their lower levels of physical activity. Agreement between objective and perceived environmental factors is often found to be moderate or low, so it is questionable to what extent ‘creating supportive neighbourhoods’ would change neighbourhood perceptions. This study among residents (N=814) of fourteen neighbourhoods in the city of Eindhoven (the Netherlands), investigated to what extent socioeconomic differences in perceived neighbourhood safety and perceived neighbourhood attractiveness can be explained by five domains of objective neighbourhood features (i.e. design, traffic safety, social safety, aesthetics, and destinations), and to what extent other factors may play a role. Unfavourable neighbourhood perceptions of low socioeconomic groups partly reflected their actual less aesthetic and less safe neighbourhoods, and partly their perceptions of low social neighbourhood cohesion and adverse psychosocial circumstances.
Resumo:
Robust image hashing seeks to transform a given input image into a shorter hashed version using a key-dependent non-invertible transform. These image hashes can be used for watermarking, image integrity authentication or image indexing for fast retrieval. This paper introduces a new method of generating image hashes based on extracting Higher Order Spectral features from the Radon projection of an input image. The feature extraction process is non-invertible, non-linear and different hashes can be produced from the same image through the use of random permutations of the input. We show that the transform is robust to typical image transformations such as JPEG compression, noise, scaling, rotation, smoothing and cropping. We evaluate our system using a verification-style framework based on calculating false match, false non-match likelihoods using the publicly available Uncompressed Colour Image database (UCID) of 1320 images. We also compare our results to Swaminathan’s Fourier-Mellin based hashing method with at least 1% EER improvement under noise, scaling and sharpening.
Resumo:
In public venues, crowd size is a key indicator of crowd safety and stability. In this paper we propose a crowd counting algorithm that uses tracking and local features to count the number of people in each group as represented by a foreground blob segment, so that the total crowd estimate is the sum of the group sizes. Tracking is employed to improve the robustness of the estimate, by analysing the history of each group, including splitting and merging events. A simplified ground truth annotation strategy results in an approach with minimal setup requirements that is highly accurate.
Resumo:
The cascading appearance-based (CAB) feature extraction technique has established itself as the state-of-the-art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we will demonstrate that the same steps taken to reduce static speaker and environmental information for the visual speech recognition application also provide similar improvements for visual speaker recognition. A further study is conducted comparing synchronous HMM (SHMM) based fusion of CAB visual features and traditional perceptual linear predictive (PLP) acoustic features to show that higher complexity inherit in the SHMM approach does not appear to provide any improvement in the final audio-visual speaker verification system over simpler utterance level score fusion.
Resumo:
The use of appropriate features to characterize an output class or object is critical for all classification problems. This paper evaluates the capability of several spectral and texture features for object-based vegetation classification at the species level using airborne high resolution multispectral imagery. Image-objects as the basic classification unit were generated through image segmentation. Statistical moments extracted from original spectral bands and vegetation index image are used as feature descriptors for image objects (i.e. tree crowns). Several state-of-art texture descriptors such as Gray-Level Co-Occurrence Matrix (GLCM), Local Binary Patterns (LBP) and its extensions are also extracted for comparison purpose. Support Vector Machine (SVM) is employed for classification in the object-feature space. The experimental results showed that incorporating spectral vegetation indices can improve the classification accuracy and obtained better results than in original spectral bands, and using moments of Ratio Vegetation Index obtained the highest average classification accuracy in our experiment. The experiments also indicate that the spectral moment features also outperform or can at least compare with the state-of-art texture descriptors in terms of classification accuracy.
Resumo:
This paper presents a method of voice activity detection (VAD) for high noise scenarios, using a noise robust voiced speech detection feature. The developed method is based on the fusion of two systems. The first system utilises the maximum peak of the normalised time-domain autocorrelation function (MaxPeak). The second zone system uses a novel combination of cross-correlation and zero-crossing rate of the normalised autocorrelation to approximate a measure of signal pitch and periodicity (CrossCorr) that is hypothesised to be noise robust. The score outputs by the two systems are then merged using weighted sum fusion to create the proposed autocorrelation zero-crossing rate (AZR) VAD. Accuracy of AZR was compared to state of the art and standardised VAD methods and was shown to outperform the best performing system with an average relative improvement of 24.8% in half-total error rate (HTER) on the QUT-NOISE-TIMIT database created using real recordings from high-noise environments.
Resumo:
Paired speaking tests are now commonly used in both high-stakes testing and classroom assessment contexts. The co-construction of discourse by candidates is regarded as a strength of paired speaking tests, as candidates have the opportunity to display a wider range of interactional competencies, including turn taking, initiating topics and engaging in extended discourse with a partner, rather than an examiner. However, the impact of the interlocutor in such jointly negotiated discourse and the implications for assessing interactional competence are areas of concern. This article reports on the features of interactional competence that were salient to four trained raters of 12 paired speaking tests through the analysis of rater notes, stimulated verbal recalls and rater discussions. Findings enabled the identification of features of the performance noted by raters when awarding scores for interactional competence, and the particular features associated with higher and lower scores. A number of these features were seen by the raters as mutual achievements, which raises the issue of the extent to which it is possible to assess individual contributions to the co-constructed performance. The findings have implications for defining the construct of interactional competence in paired speaking tests and operationalising this in rating scales.
Resumo:
Facial expression is an important channel for human communication and can be applied in many real applications. One critical step for facial expression recognition (FER) is to accurately extract emotional features. Current approaches on FER in static images have not fully considered and utilized the features of facial element and muscle movements, which represent static and dynamic, as well as geometric and appearance characteristics of facial expressions. This paper proposes an approach to solve this limitation using ‘salient’ distance features, which are obtained by extracting patch-based 3D Gabor features, selecting the ‘salient’ patches, and performing patch matching operations. The experimental results demonstrate high correct recognition rate (CRR), significant performance improvements due to the consideration of facial element and muscle movements, promising results under face registration errors, and fast processing time. The comparison with the state-of-the-art performance confirms that the proposed approach achieves the highest CRR on the JAFFE database and is among the top performers on the Cohn-Kanade (CK) database.
Resumo:
Human facial expression is a complex process characterized of dynamic, subtle and regional emotional features. State-of-the-art approaches on facial expression recognition (FER) have not fully utilized this kind of features to improve the recognition performance. This paper proposes an approach to overcome this limitation using patch-based ‘salient’ Gabor features. A set of 3D patches are extracted to represent the subtle and regional features, and then inputted into patch matching operations for capturing the dynamic features. Experimental results show a significant performance improvement of the proposed approach due to the use of the dynamic features. Performance comparison with pervious work also confirms that the proposed approach achieves the highest CRR reported to date on the JAFFE database and a top-level performance on the Cohn-Kanade (CK) database.