230 resultados para FEATURE-EXTRACTION
Resumo:
In automatic facial expression detection, very accurate registration is desired which can be achieved via a deformable model approach where a dense mesh of 60-70 points on the face is used, such as an active appearance model (AAM). However, for applications where manually labeling frames is prohibitive, AAMs do not work well as they do not generalize well to unseen subjects. As such, a more coarse approach is taken for person-independent facial expression detection, where just a couple of key features (such as face and eyes) are tracked using a Viola-Jones type approach. The tracked image is normally post-processed to encode for shift and illumination invariance using a linear bank of filters. Recently, it was shown that this preprocessing step is of no benefit when close to ideal registration has been obtained. In this paper, we present a system based on the Constrained Local Model (CLM) which is a generic or person-independent face alignment algorithm which gains high accuracy. We show these results against the LBP feature extraction on the CK+ and GEMEP datasets.
Resumo:
The computation of compact and meaningful representations of high dimensional sensor data has recently been addressed through the development of Nonlinear Dimensional Reduction (NLDR) algorithms. The numerical implementation of spectral NLDR techniques typically leads to a symmetric eigenvalue problem that is solved by traditional batch eigensolution algorithms. The application of such algorithms in real-time systems necessitates the development of sequential algorithms that perform feature extraction online. This paper presents an efficient online NLDR scheme, Sequential-Isomap, based on incremental singular value decomposition (SVD) and the Isomap method. Example simulations demonstrate the validity and significant potential of this technique in real-time applications such as autonomous systems.
Resumo:
This paper presents a general methodology for learning articulated motions that, despite having non-linear correlations, are cyclical and have a defined pattern of behavior Using conventional algorithms to extract features from images, a Bayesian classifier is applied to cluster and classify features of the moving object. Clusters are then associated in different frames and structure learning algorithms for Bayesian networks are used to recover the structure of the motion. This framework is applied to the human gait analysis and tracking but applications include any coordinated movement such as multi-robots behavior analysis.
Resumo:
The use of appropriate features to represent an output class or object is critical for all classification problems. In this paper, we propose a biologically inspired object descriptor to represent the spectral-texture patterns of image-objects. The proposed feature descriptor is generated from the pulse spectral frequencies (PSF) of a pulse coupled neural network (PCNN), which is invariant to rotation, translation and small scale changes. The proposed method is first evaluated in a rotation and scale invariant texture classification using USC-SIPI texture database. It is further evaluated in an application of vegetation species classification in power line corridor monitoring using airborne multi-spectral aerial imagery. The results from the two experiments demonstrate that the PSF feature is effective to represent spectral-texture patterns of objects and it shows better results than classic color histogram and texture features.
Resumo:
The practice of robotics and computer vision each involve the application of computational algorithms to data. The research community has developed a very large body of algorithms but for a newcomer to the field this can be quite daunting. For more than 10 years the author has maintained two open-source MATLAB® Toolboxes, one for robotics and one for vision. They provide implementations of many important algorithms and allow users to work with real problems, not just trivial examples. This new book makes the fundamental algorithms of robotics, vision and control accessible to all. It weaves together theory, algorithms and examples in a narrative that covers robotics and computer vision separately and together. Using the latest versions of the Toolboxes the author shows how complex problems can be decomposed and solved using just a few simple lines of code. The topics covered are guided by real problems observed by the author over many years as a practitioner of both robotics and computer vision. It is written in a light but informative style, it is easy to read and absorb, and includes over 1000 MATLAB® and Simulink® examples and figures. The book is a real walk through the fundamentals of mobile robots, navigation, localization, arm-robot kinematics, dynamics and joint level control, then camera models, image processing, feature extraction and multi-view geometry, and finally bringing it all together with an extensive discussion of visual servo systems.
Resumo:
Facial expression recognition (FER) algorithms mainly focus on classification into a small discrete set of emotions or representation of emotions using facial action units (AUs). Dimensional representation of emotions as continuous values in an arousal-valence space is relatively less investigated. It is not fully known whether fusion of geometric and texture features will result in better dimensional representation of spontaneous emotions. Moreover, the performance of many previously proposed approaches to dimensional representation has not been evaluated thoroughly on publicly available databases. To address these limitations, this paper presents an evaluation framework for dimensional representation of spontaneous facial expressions using texture and geometric features. SIFT, Gabor and LBP features are extracted around facial fiducial points and fused with FAP distance features. The CFS algorithm is adopted for discriminative texture feature selection. Experimental results evaluated on the publicly accessible NVIE database demonstrate that fusion of texture and geometry does not lead to a much better performance than using texture alone, but does result in a significant performance improvement over geometry alone. LBP features perform the best when fused with geometric features. Distributions of arousal and valence for different emotions obtained via the feature extraction process are compared with those obtained from subjective ground truth values assigned by viewers. Predicted valence is found to have a more similar distribution to ground truth than arousal in terms of covariance or Bhattacharya distance, but it shows a greater distance between the means.
Resumo:
A new algorithm for extracting features from images for object recognition is described. The algorithm uses higher order spectra to provide desirable invariance properties, to provide noise immunity, and to incorporate nonlinearity into the feature extraction procedure thereby allowing the use of simple classifiers. An image can be reduced to a set of 1D functions via the Radon transform, or alternatively, the Fourier transform of each 1D projection can be obtained from a radial slice of the 2D Fourier transform of the image according to the Fourier slice theorem. A triple product of Fourier coefficients, referred to as the deterministic bispectrum, is computed for each 1D function and is integrated along radial lines in bifrequency space. Phases of the integrated bispectra are shown to be translation- and scale-invariant. Rotation invariance is achieved by a regrouping of these invariants at a constant radius followed by a second stage of invariant extraction. Rotation invariance is thus converted to translation invariance in the second step. Results using synthetic and actual images show that isolated, compact clusters are formed in feature space. These clusters are linearly separable, indicating that the nonlinearity required in the mapping from the input space to the classification space is incorporated well into the feature extraction stage. The use of higher order spectra results in good noise immunity, as verified with synthetic and real images. Classification of images using the higher order spectra-based algorithm compares favorably to classification using the method of moment invariants
Resumo:
A new approach to recognition of images using invariant features based on higher-order spectra is presented. Higher-order spectra are translation invariant because translation produces linear phase shifts which cancel. Scale and amplification invariance are satisfied by the phase of the integral of a higher-order spectrum along a radial line in higher-order frequency space because the contour of integration maps onto itself and both the real and imaginary parts are affected equally by the transformation. Rotation invariance is introduced by deriving invariants from the Radon transform of the image and using the cyclic-shift invariance property of the discrete Fourier transform magnitude. Results on synthetic and actual images show isolated, compact clusters in feature space and high classification accuracies
Resumo:
Features derived from the trispectra of DFT magnitude slices are used for multi-font digit recognition. These features are insensitive to translation, rotation, or scaling of the input. They are also robust to noise. Classification accuracy tests were conducted on a common data base of 256× 256 pixel bilevel images of digits in 9 fonts. Randomly rotated and translated noisy versions were used for training and testing. The results indicate that the trispectral features are better than moment invariants and affine moment invariants. They achieve a classification accuracy of 95% compared to about 81% for Hu's (1962) moment invariants and 39% for the Flusser and Suk (1994) affine moment invariants on the same data in the presence of 1% impulse noise using a 1-NN classifier. For comparison, a multilayer perceptron with no normalization for rotations and translations yields 34% accuracy on 16× 16 pixel low-pass filtered and decimated versions of the same data.
Resumo:
A system to segment and recognize Australian 4-digit postcodes from address labels on parcels is described. Images of address labels are preprocessed and adaptively thresholded to reduce noise. Projections are used to segment the line and then the characters comprising the postcode. Individual digits are recognized using bispectral features extracted from their parallel beam projections. These features are insensitive to translation, scaling and rotation, and robust to noise. Results on scanned images are presented. The system is currently being improved and implemented to work on-line.
Resumo:
This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms the performance of either sub-system. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise
Resumo:
Investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming two independent data streams. Recent work with multi-modal MSHMMs has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously (T.J. Wark et al., 1998), however this has been restricted to output fusion via single-stream HMMs. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification
Resumo:
Investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. We have previously shown (Int. Conf. on Acoustics, Speech and Signal Proc., vol. 6, pp. 3693-3696, May 1998) that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms either subsystem individually. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise
Resumo:
Data preprocessing is widely recognized as an important stage in anomaly detection. This paper reviews the data preprocessing techniques used by anomaly-based network intrusion detection systems (NIDS), concentrating on which aspects of the network traffic are analyzed, and what feature construction and selection methods have been used. Motivation for the paper comes from the large impact data preprocessing has on the accuracy and capability of anomaly-based NIDS. The review finds that many NIDS limit their view of network traffic to the TCP/IP packet headers. Time-based statistics can be derived from these headers to detect network scans, network worm behavior, and denial of service attacks. A number of other NIDS perform deeper inspection of request packets to detect attacks against network services and network applications. More recent approaches analyze full service responses to detect attacks targeting clients. The review covers a wide range of NIDS, highlighting which classes of attack are detectable by each of these approaches. Data preprocessing is found to predominantly rely on expert domain knowledge for identifying the most relevant parts of network traffic and for constructing the initial candidate set of traffic features. On the other hand, automated methods have been widely used for feature extraction to reduce data dimensionality, and feature selection to find the most relevant subset of features from this candidate set. The review shows a trend toward deeper packet inspection to construct more relevant features through targeted content parsing. These context sensitive features are required to detect current attacks.
Resumo:
This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.