33 resultados para automatic speech recognition

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speaker recognition is the process of automatically recognizing the speaker by analyzing individual information contained in the speech waves. In this paper, we discuss the development of an intelligent system for text-dependent speaker recognition. The system comprises two main modules, a wavelet-based signal-processing module for feature extraction of speech waves, and an artificial-neural-network-based classifier module to identify and categorize the speakers. Wavelet is used in de-noising and in compressing the speech signals. The wavelet family that we used is the Daubechies Wavelets. After extracting the necessary features from the speech waves, the features were then fed to a neural-network-based classifier to identify the speakers. We have implemented the Fuzzy ARTMAP (FAM) network in the classifier module to categorize the de-noised and compressed signals. The proposed intelligent learning system has been applied to a case study of text-dependent speaker recognition problem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this work is to recognize all the frontal faces of a character in the closed world of a movie or situation comedy, given a small number of query faces. This is challenging because faces in a feature-length film are relatively uncontrolled with a wide variability of scale, pose, illumination, and expressions, and also may be partially occluded. We develop a recognition method based on a cascade of processing steps that normalize for the effects of the changing imaging environment. In particular there are three areas of novelty: (i) we suppress the background surrounding the face, enabling the maximum area of the face to be retained for recognition rather than a subset; (ii) we include a pose refinement step to optimize the registration between the test image and face exemplar; and (iii) we use robust distance to a sub-space to allow for partial occlusion and expression change. The method is applied and evaluated on several feature length films. It is demonstrated that high recall rates (over 92%) can be achieved whilst maintaining good precision (over 93%).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Illumination invariance remains the most researched, yet the most challenging aspect of automatic face recognition. In this paper we propose a novel, general recognition framework for efficient matching of individual face images, sets or sequences. The framework is based on simple image processing filters that compete with unprocessed greyscale input to yield a single matching score between individuals. It is shown how the discrepancy between illumination conditions between novel input and the training data set can be estimated and used to weigh the contribution of two competing representations. We describe an extensive empirical evaluation of the proposed method on 171 individuals and over 1300 video sequences with extreme illumination, pose and head motion variation. On this challenging data set our algorithm consistently demonstrated a dramatic performance improvement over traditional filtering approaches. We demonstrate a reduction of 50-75% in recognition error rates, the best performing method-filter combination correctly recognizing 96% of the individuals.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With an increasing emphasis on the emerging automatic person identification application, biometrics based, especially fingerprint-based identification, is receiving a lot of attention. This research developed an automatic fingerprint recognition system (AFRS) based on a hybrid between minutiae and correlation based techniques to represent and to match fingerprint; it improved each technique individually. It was noticed that, in the hybrid approach, as a result of an improvement of minutiae extraction algorithm in post-process phase that combines the two algorithms, the performance of the minutia algorithm improved. An improvement in the ridge algorithm that used centre point in fingerprint instead of reference point was also observed. Experiments indicate that the hybrid technique performs much better than each algorithm individually.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we present our system for online context recognition of multimodal sequences acquired from multiple sensors. The system uses Dynamic Time Warping (DTW) to recognize multimodal sequences of different lengths, embedded in continuous data streams. We evaluate the performance of our system on two real world datasets: 1) accelerometer data acquired from performing two hand gestures and 2) NOKIA's benchmark dataset for context recognition. The results from both datasets demonstrate that the system can perform online context recognition efficiently and achieve high recognition accuracy.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The self-quotient image is a biologically inspired representation which has been proposed as an illumination invariant feature for automatic face recognition. Owing to the lack of strong domain specific assumptions underlying this representation, it can be readily extracted from raw images irrespective of the persons's pose, facial expression etc. What makes the self-quotient image additionally attractive is that it can be computed quickly and in a closed form using simple low-level image operations. However, it is generally accepted that the self-quotient is insufficiently robust to large illumination changes which is why it is mainly used in applications in which low precision is an acceptable compromise for high recall (e.g. retrieval systems). Yet, in this paper we demonstrate that the performance of this representation in challenging illuminations has been greatly underestimated. We show that its error rate can be reduced by over an order of magnitude, without any changes to the representation itself. Rather, we focus on the manner in which the dissimilarity between two self-quotient images is computed. By modelling the dominant sources of noise affecting the representation, we propose and evaluate a series of different dissimilarity measures, the best of which reduces the initial error rate of 63.0% down to only 5.7% on the notoriously challenging YaleB data set.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Automatic face recognition (AFR) is an area with immense practical potential which includes a wide range of commercial and law enforcement applications, and it continues to be one of the most active research areas of computer vision. Even after over three decades of intense research, the state-of-the-art in AFR continues to improve, benefiting from advances in a range of different fields including image processing, pattern recognition, computer graphics and physiology. However, systems based on visible spectrum images continue to face challenges in the presence of illumination, pose and expression changes, as well as facial disguises, all of which can significantly decrease their accuracy. Amongst various approaches which have been proposed in an attempt to overcome these limitations, the use of infrared (IR) imaging has emerged as a particularly promising research direction. This paper presents a comprehensive and timely review of the literature on this subject.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Illumination invariance remains the most researched, yet the most challenging aspect of automatic face recognition. In this paper we investigate the discriminative power of colour-based invariants in the presence of large illumination changes between training and test data, when appearance changes due to cast shadows and non-Lambertian effects are significant. Specifically, there are three main contributions: (i) we employ a more sophisticated photometric model of the camera and show how its parameters can be estimated, (ii) we derive several novel colour-based face invariants, and (iii) on a large database of video sequences we examine and evaluate the largest number of colour-based representations in the literature. Our results suggest that colour invariants do have a substantial discriminative power which may increase the robustness and accuracy of recognition from low resolution images.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In many automatic face recognition applications, a set of a person's face images is available rather than a single image. In this paper, we describe a novel method for face recognition using image sets. We propose a flexible, semi-parametric model for learning probability densities confined to highly non-linear but intrinsically low-dimensional manifolds. The model leads to a statistical formulation of the recognition problem in terms of minimizing the divergence between densities estimated on these manifolds. The proposed method is evaluated on a large data set, acquired in realistic imaging conditions with severe illumination variation. Our algorithm is shown to match the best and outperform other state-of-the-art algorithms in the literature, achieving 94% recognition rate on average.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Illumination and pose invariance are the most challenging aspects of face recognition. In this paper we describe a fully automatic face recognition system that uses video information to achieve illumination and pose robustness. In the proposed method, highly nonlinear manifolds of face motion are approximated using three Gaussian pose clusters. Pose robustness is achieved by comparing the corresponding pose clusters and probabilistically combining the results to derive a measure of similarity between two manifolds. Illumination is normalized on a per-pose basis. Region-based gamma intensity correction is used to correct for coarse illumination changes, while further refinement is achieved by combining a learnt linear manifold of illumination variation with constraints on face pattern distribution, derived from video. Comparative experimental evaluation is presented and the proposed method is shown to greatly outperform state-of-the-art algorithms. Consistent recognition rates of 94-100% are achieved across dramatic changes in illumination.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Automatic face recognition is an area with immense practical potential which includes a wide range of commercial and law enforcement applications. Hence it is unsurprising that it continues to be one of the most active research areas of computer vision. Even after over three decades of intense research, the state-of-the-art in face recognition continues to improve, benefitting from advances in a range of different research fields such as image processing, pattern recognition, computer graphics, and physiology. Systems based on visible spectrum images, the most researched face recognition modality, have reached a significant level of maturity with some practical success. However, they continue to face challenges in the presence of illumination, pose and expression changes, as well as facial disguises, all of which can significantly decrease recognition accuracy. Amongst various approaches which have been proposed in an attempt to overcome these limitations, the use of infrared (IR) imaging has emerged as a particularly promising research direction. This paper presents a comprehensive and timely review of the literature on this subject. Our key contributions are (i) a summary of the inherent properties of infrared imaging which makes this modality promising in the context of face recognition; (ii) a systematic review of the most influential approaches, with a focus on emerging common trends as well as key differences between alternative methodologies; (iii) a description of the main databases of infrared facial images available to the researcher; and lastly (iv) a discussion of the most promising avenues for future research. © 2014 Elsevier Ltd.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Appliance-specific Load Monitoring (LM) provides a possible solution to the problem of energy conservation which is becoming increasingly challenging, due to growing energy demands within offices and residential spaces. It is essential to perform automatic appliance recognition and monitoring for optimal resource utilization. In this paper, we study the use of non-intrusive LM methods that rely on steady-state appliance signatures for classifying most commonly used office appliances, while demonstrating their limitation in terms of accurately discerning the low-power devices due to overlapping load signatures. We propose a multi-layer decision architecture that makes use of audio features derived from device sounds and fuse it with load signatures acquired from energy meter. For the recognition of device sounds, we perform feature set selection by evaluating the combination of time-domain and FFT-based audio features on the state of the art machine learning algorithms. Further, we demonstrate that our proposed feature set which is a concatenation of device audio feature and load signature significantly improves the device recognition accuracy in comparison to the use of steady-state load signatures only.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We develop an algorithm for the detection and classification of affective sound events underscored by specific patterns of sound energy dynamics. We relate the portrayal of these events to proposed high level affect or emotional coloring of the events. In this paper, four possible characteristic sound energy events are identified that convey well established meanings through their dynamics to portray and deliver certain affect, sentiment related to the horror film genre. Our algorithm is developed with the ultimate aim of automatically structuring sections of films that contain distinct shades of emotion related to horror themes for nonlinear media access and navigation. An average of 82% of the energy events, obtained from the analysis of the audio tracks of sections of four sample films corresponded correctly to the proposed affect. While the discrimination between certain sound energy event types was low, the algorithm correctly detected 71% of the occurrences of the sound energy events within audio tracks of the films analyzed, and thus forms a useful basis for determining affective scenes characteristic of horror in movies.