997 resultados para Session variability
Resumo:
This work presents an extended Joint Factor Analysis model including explicit modelling of unwanted within-session variability. The goals of the proposed extended JFA model are to improve verification performance with short utterances by compensating for the effects of limited or imbalanced phonetic coverage, and to produce a flexible JFA model that is effective over a wide range of utterance lengths without adjusting model parameters such as retraining session subspaces. Experimental results on the 2006 NIST SRE corpus demonstrate the flexibility of the proposed model by providing competitive results over a wide range of utterance lengths without retraining and also yielding modest improvements in a number of conditions over current state-of-the-art.
Resumo:
Object classification is plagued by the issue of session variation. Session variation describes any variation that makes one instance of an object look different to another, for instance due to pose or illumination variation. Recent work in the challenging task of face verification has shown that session variability modelling provides a mechanism to overcome some of these limitations. However, for computer vision purposes, it has only been applied in the limited setting of face verification. In this paper we propose a local region based intersession variability (ISV) modelling approach, and apply it to challenging real-world data. We propose a region based session variability modelling approach so that local session variations can be modelled, termed Local ISV. We then demonstrate the efficacy of this technique on a challenging real-world fish image database which includes images taken underwater, providing significant real-world session variations. This Local ISV approach provides a relative performance improvement of, on average, 23% on the challenging MOBIO, Multi-PIE and SCface face databases. It also provides a relative performance improvement of 35% on our challenging fish image dataset.
Resumo:
This paper presents Scatter Difference Nuisance Attribute Projection (SD-NAP) as an enhancement to NAP for SVM-based speaker verification. While standard NAP may inadvertently remove desirable speaker variability, SD-NAP explicitly de-emphasises this variability by incorporating a weighted version of the between-class scatter into the NAP optimisation criterion. Experimental evaluation of SD-NAP with a variety of SVM systems on the 2006 and 2008 NIST SRE corpora demonstrate that SD-NAP provides improved verification performance over standard NAP in most cases, particularly at the EER operating point.
Resumo:
Automatic recognition of people is an active field of research with important forensic and security applications. In these applications, it is not always possible for the subject to be in close proximity to the system. Voice represents a human behavioural trait which can be used to recognise people in such situations. Automatic Speaker Verification (ASV) is the process of verifying a persons identity through the analysis of their speech and enables recognition of a subject at a distance over a telephone channel { wired or wireless. A significant amount of research has focussed on the application of Gaussian mixture model (GMM) techniques to speaker verification systems providing state-of-the-art performance. GMM's are a type of generative classifier trained to model the probability distribution of the features used to represent a speaker. Recently introduced to the field of ASV research is the support vector machine (SVM). An SVM is a discriminative classifier requiring examples from both positive and negative classes to train a speaker model. The SVM is based on margin maximisation whereby a hyperplane attempts to separate classes in a high dimensional space. SVMs applied to the task of speaker verification have shown high potential, particularly when used to complement current GMM-based techniques in hybrid systems. This work aims to improve the performance of ASV systems using novel and innovative SVM-based techniques. Research was divided into three main themes: session variability compensation for SVMs; unsupervised model adaptation; and impostor dataset selection. The first theme investigated the differences between the GMM and SVM domains for the modelling of session variability | an aspect crucial for robust speaker verification. Techniques developed to improve the robustness of GMMbased classification were shown to bring about similar benefits to discriminative SVM classification through their integration in the hybrid GMM mean supervector SVM classifier. Further, the domains for the modelling of session variation were contrasted to find a number of common factors, however, the SVM-domain consistently provided marginally better session variation compensation. Minimal complementary information was found between the techniques due to the similarities in how they achieved their objectives. The second theme saw the proposal of a novel model for the purpose of session variation compensation in ASV systems. Continuous progressive model adaptation attempts to improve speaker models by retraining them after exploiting all encountered test utterances during normal use of the system. The introduction of the weight-based factor analysis model provided significant performance improvements of over 60% in an unsupervised scenario. SVM-based classification was then integrated into the progressive system providing further benefits in performance over the GMM counterpart. Analysis demonstrated that SVMs also hold several beneficial characteristics to the task of unsupervised model adaptation prompting further research in the area. In pursuing the final theme, an innovative background dataset selection technique was developed. This technique selects the most appropriate subset of examples from a large and diverse set of candidate impostor observations for use as the SVM background by exploiting the SVM training process. This selection was performed on a per-observation basis so as to overcome the shortcoming of the traditional heuristic-based approach to dataset selection. Results demonstrate the approach to provide performance improvements over both the use of the complete candidate dataset and the best heuristically-selected dataset whilst being only a fraction of the size. The refined dataset was also shown to generalise well to unseen corpora and be highly applicable to the selection of impostor cohorts required in alternate techniques for speaker verification.
Resumo:
Speaker verification is the process of verifying the identity of a person by analysing their speech. There are several important applications for automatic speaker verification (ASV) technology including suspect identification, tracking terrorists and detecting a person’s presence at a remote location in the surveillance domain, as well as person authentication for phone banking and credit card transactions in the private sector. Telephones and telephony networks provide a natural medium for these applications. The aim of this work is to improve the usefulness of ASV technology for practical applications in the presence of adverse conditions. In a telephony environment, background noise, handset mismatch, channel distortions, room acoustics and restrictions on the available testing and training data are common sources of errors for ASV systems. Two research themes were pursued to overcome these adverse conditions: Modelling mismatch and modelling uncertainty. To directly address the performance degradation incurred through mismatched conditions it was proposed to directly model this mismatch. Feature mapping was evaluated for combating handset mismatch and was extended through the use of a blind clustering algorithm to remove the need for accurate handset labels for the training data. Mismatch modelling was then generalised by explicitly modelling the session conditions as a constrained offset of the speaker model means. This session variability modelling approach enabled the modelling of arbitrary sources of mismatch, including handset type, and halved the error rates in many cases. Methods to model the uncertainty in speaker model estimates and verification scores were developed to address the difficulties of limited training and testing data. The Bayes factor was introduced to account for the uncertainty of the speaker model estimates in testing by applying Bayesian theory to the verification criterion, with improved performance in matched conditions. Modelling the uncertainty in the verification score itself met with significant success. Estimating a confidence interval for the "true" verification score enabled an order of magnitude reduction in the average quantity of speech required to make a confident verification decision based on a threshold. The confidence measures developed in this work may also have significant applications for forensic speaker verification tasks.
Resumo:
This paper presents an extended study on the implementation of support vector machine(SVM) based speaker verification in systems that employ continuous progressive model adaptation using the weight-based factor analysis model. The weight-based factor analysis model compensates for session variations in unsupervised scenarios by incorporating trial confidence measures in the general statistics used in the inter-session variability modelling process. Employing weight-based factor analysis in Gaussian mixture models (GMM) was recently found to provide significant performance gains to unsupervised classification. Further improvements in performance were found through the integration of SVM-based classification in the system by means of GMM supervectors. This study focuses particularly on the way in which a client is represented in the SVM kernel space using single and multiple target supervectors. Experimental results indicate that training client SVMs using a single target supervector maximises performance while exhibiting a certain robustness to the inclusion of impostor training data in the model. Furthermore, the inclusion of low-scoring target trials in the adaptation process is investigated where they were found to significantly aid performance.
Resumo:
In this paper we extend the concept of speaker annotation within a single-recording, or speaker diarization, to a collection wide approach we call speaker attribution. Accordingly, speaker attribution is the task of clustering expectantly homogenous intersession clusters obtained using diarization according to common cross-recording identities. The result of attribution is a collection of spoken audio across multiple recordings attributed to speaker identities. In this paper, an attribution system is proposed using mean-only MAP adaptation of a combined-gender UBM to model clusters from a perfect diarization system, as well as a JFA-based system with session variability compensation. The normalized cross-likelihood ratio is calculated for each pair of clusters to construct an attribution matrix and the complete linkage algorithm is employed to conduct clustering of the inter-session clusters. A matched cluster purity and coverage of 87.1% was obtained on the NIST 2008 SRE corpus.
Resumo:
This paper investigates the use of the dimensionality-reduction techniques weighted linear discriminant analysis (WLDA), and weighted median fisher discriminant analysis (WMFD), before probabilistic linear discriminant analysis (PLDA) modeling for the purpose of improving speaker verification performance in the presence of high inter-session variability. Recently it was shown that WLDA techniques can provide improvement over traditional linear discriminant analysis (LDA) for channel compensation in i-vector based speaker verification systems. We show in this paper that the speaker discriminative information that is available in the distance between pair of speakers clustered in the development i-vector space can also be exploited in heavy-tailed PLDA modeling by using the weighted discriminant approaches prior to PLDA modeling. Based upon the results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset, we believe that WLDA and WMFD projections before PLDA modeling can provide an improved approach when compared to uncompensated PLDA modeling for i-vector based speaker verification systems.
Resumo:
A significant amount of speech is typically required for speaker verification system development and evaluation, especially in the presence of large intersession variability. This paper introduces a source and utterance duration normalized linear discriminant analysis (SUN-LDA) approaches to compensate session variability in short-utterance i-vector speaker verification systems. Two variations of SUN-LDA are proposed where normalization techniques are used to capture source variation from both short and full-length development i-vectors, one based upon pooling (SUN-LDA-pooled) and the other on concatenation (SUN-LDA-concat) across the duration and source-dependent session variation. Both the SUN-LDA-pooled and SUN-LDA-concat techniques are shown to provide improvement over traditional LDA on NIST 08 truncated 10sec-10sec evaluation conditions, with the highest improvement obtained with the SUN-LDA-concat technique achieving a relative improvement of 8% in EER for mis-matched conditions and over 3% for matched conditions over traditional LDA approaches.
Resumo:
This paper examines the issue of face, speaker and bi-modal authentication in mobile environments when there is significant condition mismatch. We introduce this mismatch by enrolling client models on high quality biometric samples obtained on a laptop computer and authenticating them on lower quality biometric samples acquired with a mobile phone. To perform these experiments we develop three novel authentication protocols for the large publicly available MOBIO database. We evaluate state-of-the-art face, speaker and bi-modal authentication techniques and show that inter-session variability modelling using Gaussian mixture models provides a consistently robust system for face, speaker and bi-modal authentication. It is also shown that multi-algorithm fusion provides a consistent performance improvement for face, speaker and bi-modal authentication. Using this bi-modal multi-algorithm system we derive a state-of-the-art authentication system that obtains a half total error rate of 6.3% and 1.9% for Female and Male trials, respectively.
Resumo:
This thesis has investigated how to cluster a large number of faces within a multi-media corpus in the presence of large session variation. Quality metrics are used to select the best faces to represent a sequence of faces; and session variation modelling improves clustering performance in the presence of wide variations across videos. Findings from this thesis contribute to improving the performance of both face verification systems and the fully automated clustering of faces from a large video corpus.
Resumo:
Both embodied and symbolic accounts of conceptual organization would predict partial sharing and partial differentiation between the neural activations seen for concepts activated via different stimulus modalities. But cross-participant and cross-session variability in BOLD activity patterns makes analyses of such patterns with MVPA methods challenging. Here, we examine the effect of cross-modal and individual variation on the machine learning analysis of fMRI data recorded during a word property generation task. We present the same set of living and non-living concepts (land-mammals, or work tools) to a cohort of Japanese participants in two sessions: the first using auditory presentation of spoken words; the second using visual presentation of words written in Japanese characters. Classification accuracies confirmed that these semantic categories could be detected in single trials, with within-session predictive accuracies of 80-90%. However cross-session prediction (learning from auditory-task data to classify data from the written-word-task, or vice versa) suffered from a performance penalty, achieving 65-75% (still individually significant at p « 0.05). We carried out several follow-on analyses to investigate the reason for this shortfall, concluding that distributional differences in neither time nor space alone could account for it. Rather, combined spatio-temporal patterns of activity need to be identified for successful cross-session learning, and this suggests that feature selection strategies could be modified to take advantage of this.
Resumo:
Background: The aim was to evaluate the validity and repeatability of the auto-refraction function of the Nidek OPD-Scan III (Nidek Technologies, Gamagori, Japan) compared with non-cycloplegic subjective refraction. The Nidek OPD-Scan III is a new aberrometer/corneal topographer workstation based on the skiascopy principle. It combines a wavefront aberrometer, topographer, autorefractor, auto keratometer and pupillometer/pupillographer. Methods: Objective refraction results obtained using the Nidek OPD-Scan III were compared with non-cycloplegic subjective refraction for 108 eyes of 54 participants (29 female) with a mean age of 23.7±9.5 years. Intra-session and inter-session variability were assessed on 14 subjects (28 eyes). Results: The Nidek OPD-Scan III gave slightly more negative readings than results obtained by subjective refraction (Nidek mean difference -0.19±0.36 DS, p<0.01 for sphere; -0.19±0.35 DS, p<0.01 for mean spherical equivalent; -0.002±0.23 DC, p=0.91 for cylinder; -0.06±0.38 DC, p=0.30 for J0 and -0.36±0.31 DC for J45, p=0.29). Auto-refractor results for 74 per cent of spherical readings and 60 per cent of cylindrical powers were within±0.25 of subjective refraction. There was high intra-session and inter-session repeatability for all parameters; 90 per cent of inter-session repeatability results were within 0.25 D. Conclusion: The Nidek OPD-Scan III gives valid and repeatable measures of objective refraction when compared with non-cycloplegic subjective refraction. © 2013 The Authors. Clinical and Experimental Optometry © 2013 Optometrists Association Australia.
Consecutive days of cold water immersion: effects on cycling performance and heart rate variability.
Resumo:
We investigated performance and heart rate (HR) variability (HRV) over consecutive days of cycling with post-exercise cold water immersion (CWI) or passive recovery (PAS). In a crossover design, 11 cyclists completed two separate 3-day training blocks (120 min cycling per day, 66 maximal sprints, 9 min time trialling [TT]), followed by 2 days of recovery-based training. The cyclists recovered from each training session by standing in cold water (10 °C) or at room temperature (27 °C) for 5 min. Mean power for sprints, total TT work and HR were assessed during each session. Resting vagal-HRV (natural logarithm of square-root of mean squared differences of successive R-R intervals; ln rMSSD) was assessed after exercise, after the recovery intervention, during sleep and upon waking. CWI allowed better maintenance of mean sprint power (between-trial difference [90 % confidence limits] +12.4 % [5.9; 18.9]), cadence (+2.0 % [0.6; 3.5]), and mean HR during exercise (+1.6 % [0.0; 3.2]) compared with PAS. ln rMSSD immediately following CWI was higher (+144 % [92; 211]) compared with PAS. There was no difference between the trials in TT performance (-0.2 % [-3.5; 3.0]) or waking ln rMSSD (-1.2 % [-5.9; 3.4]). CWI helps to maintain sprint performance during consecutive days of training, whereas its effects on vagal-HRV vary over time and depend on prior exercise intensity.