Biblioteca Digital

This work aims to take advantage of recent developments in joint factor analysis (JFA) in the context of a phonetically conditioned GMM speaker verification system. Previous work has shown performance advantages through phonetic conditioning, but this has not been shown to date with the JFA framework. Our focus is particularly on strategies for combining the phone-conditioned systems. We show that the classic fusion of the scores is suboptimal when using multiple GMM systems. We investigate several combination strategies in the model space, and demonstrate improvement over score-level combination as well as over a non-phonetic baseline system. This work was conducted during the 2008 CLSP Workshop at Johns Hopkins University.

Veja mais

Improved GMM-based speaker verification using SVM-driven impostor dataset selection

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The problem of impostor dataset selection for GMM-based speaker verification is addressed through the recently proposed data-driven background dataset refinement technique. The SVM-based refinement technique selects from a candidate impostor dataset those examples that are most frequently selected as support vectors when training a set of SVMs on a development corpus. This study demonstrates the versatility of dataset refinement in the task of selecting suitable impostor datasets for use in GMM-based speaker verification. The use of refined Z- and T-norm datasets provided performance gains of 15% in EER in the NIST 2006 SRE over the use of heuristically selected datasets. The refined datasets were shown to generalise well to the unseen data of the NIST 2008 SRE.

Veja mais

Data-driven impostor selection for T-norm score normalisation and the background dataset in SVM-based speaker verification

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A data-driven background dataset refinement technique was recently proposed for SVM based speaker verification. This method selects a refined SVM background dataset from a set of candidate impostor examples after individually ranking examples by their relevance. This paper extends this technique to the refinement of the T-norm dataset for SVM-based speaker verification. The independent refinement of the background and T-norm datasets provides a means of investigating the sensitivity of SVM-based speaker verification performance to the selection of each of these datasets. Using refined datasets provided improvements of 13% in min. DCF and 9% in EER over the full set of impostor examples on the 2006 SRE corpus with the majority of these gains due to refinement of the T-norm dataset. Similar trends were observed for the unseen data of the NIST 2008 SRE.

Veja mais

Scatter difference NAP for SVM speaker recognition

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents Scatter Difference Nuisance Attribute Projection (SD-NAP) as an enhancement to NAP for SVM-based speaker verification. While standard NAP may inadvertently remove desirable speaker variability, SD-NAP explicitly de-emphasises this variability by incorporating a weighted version of the between-class scatter into the NAP optimisation criterion. Experimental evaluation of SD-NAP with a variety of SVM systems on the 2006 and 2008 NIST SRE corpora demonstrate that SD-NAP provides improved verification performance over standard NAP in most cases, particularly at the EER operating point.

Veja mais

Within-session variability modelling for factor analysis speaker verification

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This work presents an extended Joint Factor Analysis model including explicit modelling of unwanted within-session variability. The goals of the proposed extended JFA model are to improve verification performance with short utterances by compensating for the effects of limited or imbalanced phonetic coverage, and to produce a flexible JFA model that is effective over a wide range of utterance lengths without adjusting model parameters such as retraining session subspaces. Experimental results on the 2006 NIST SRE corpus demonstrate the flexibility of the proposed model by providing competitive results over a wide range of utterance lengths without retraining and also yielding modest improvements in a number of conditions over current state-of-the-art.

Veja mais

Minimising speaker verification utterance length through confidence based early verification decisions

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a novel approach of estimating the confidence interval of speaker verification scores. This approach is utilised to minimise the utterance lengths required in order to produce a confident verification decision. The confidence estimation method is also extended to address both the problem of high correlation in consecutive frame scores, and robustness with very limited training samples. The proposed technique achieves a drastic reduction in the typical data requirements for producing confident decisions in an automatic speaker verification system. When evaluated on the NIST 2005 SRE, the early verification decision method demonstrates that an average of 5–10 seconds of speech is sufficient to produce verification rates approaching those achieved previously using an average in excess of 100 seconds of speech.

Veja mais

The effect of language models on phonetic decoding for spoken term detection

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Spoken term detection (STD) popularly involves performing word or sub-word level speech recognition and indexing the result. This work challenges the assumption that improved speech recognition accuracy implies better indexing for STD. Using an index derived from phone lattices, this paper examines the effect of language model selection on the relationship between phone recognition accuracy and STD accuracy. Results suggest that language models usually improve phone recognition accuracy but their inclusion does not always translate to improved STD accuracy. The findings suggest that using phone recognition accuracy to measure the quality of an STD index can be problematic, and highlight the need for an alternative that is more closely aligned with the goals of the specific detection task.

Veja mais

Spoken term detection using fast phonetic decoding

Relevância:

10.00% 10.00%

Publicador:

Resumo:

While spoken term detection (STD) systems based on word indices provide good accuracy, there are several practical applications where it is infeasible or too costly to employ an LVCSR engine. An STD system is presented, which is designed to incorporate a fast phonetic decoding front-end and be robust to decoding errors whilst still allowing for rapid search speeds. This goal is achieved through mono-phone open-loop decoding coupled with fast hierarchical phone lattice search. Results demonstrate that an STD system that is designed with the constraint of a fast and simple phonetic decoding front-end requires a compromise to be made between search speed and search accuracy.

Veja mais

“I smile when I’m angry!” an examination of emotional dissonance among police officers

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the growth of service industry occupations, managing emotions at work has increased as a topic of interest among scholars and practitioners in organisational behaviour and human resource management(Grandey, 2000). Emotional dissonance occurs when there is discrepancy between organisational sanctioned emotions and actual emotions of employees(Zapf, Vogt, Seifert, Mertini, & Isic, 1999). This discrepancy can be associated with significant levels of psychological ill-health (Zapf, Seifert, Schmutte, Mertini, & Holz, 2001). Policing is consistently ranked among the top five stressful/high-risk occupations (e.g. Coman, Evans, Stanley, & Burrows, 1991). Police officers act as the front-line contact when dealing directly with community members; they are expected to be social workers, teachers, role models, and counsellors. Operational police officers are often required to suppress their actual emotions during their work, in order to perform their job to formally designated procedures and standards.

Veja mais

Speech endpoint detection using gradient based edge detection techniques

Relevância:

10.00% 10.00%

Publicador:

Veja mais

43 resultados para Vogt, KarlVogt, KarlKarlVogt

em Queensland University of Technology - ePrints Archive

Filtro por publicador