900 resultados para short utterance i-vectors
Resumo:
This paper proposes techniques to improve the performance of i-vector based speaker verification systems when only short utterances are available. Short-length utterance i-vectors vary with speaker, session variations, and the phonetic content of the utterance. Well established methods such as linear discriminant analysis (LDA), source-normalized LDA (SN-LDA) and within-class covariance normalisation (WCCN) exist for compensating the session variation but we have identified the variability introduced by phonetic content due to utterance variation as an additional source of degradation when short-duration utterances are used. To compensate for utterance variations in short i-vector speaker verification systems using cosine similarity scoring (CSS), we have introduced a short utterance variance normalization (SUVN) technique and a short utterance variance (SUV) modelling approach at the i-vector feature level. A combination of SUVN with LDA and SN-LDA is proposed to compensate the session and utterance variations and is shown to provide improvement in performance over the traditional approach of using LDA and/or SN-LDA followed by WCCN. An alternative approach is also introduced using probabilistic linear discriminant analysis (PLDA) approach to directly model the SUV. The combination of SUVN, LDA and SN-LDA followed by SUV PLDA modelling provides an improvement over the baseline PLDA approach. We also show that for this combination of techniques, the utterance variation information needs to be artificially added to full-length i-vectors for PLDA modelling.
Resumo:
A significant amount of speech is typically required for speaker verification system development and evaluation, especially in the presence of large intersession variability. This paper introduces a source and utterance duration normalized linear discriminant analysis (SUN-LDA) approaches to compensate session variability in short-utterance i-vector speaker verification systems. Two variations of SUN-LDA are proposed where normalization techniques are used to capture source variation from both short and full-length development i-vectors, one based upon pooling (SUN-LDA-pooled) and the other on concatenation (SUN-LDA-concat) across the duration and source-dependent session variation. Both the SUN-LDA-pooled and SUN-LDA-concat techniques are shown to provide improvement over traditional LDA on NIST 08 truncated 10sec-10sec evaluation conditions, with the highest improvement obtained with the SUN-LDA-concat technique achieving a relative improvement of 8% in EER for mis-matched conditions and over 3% for matched conditions over traditional LDA approaches.
Resumo:
This paper proposes a combination of source-normalized weighted linear discriminant analysis (SN-WLDA) and short utterance variance (SUV) PLDA modelling to improve the short utterance PLDA speaker verification. As short-length utterance i-vectors vary with the speaker, session variations and phonetic content of the utterance (utterance variation), a combined approach of SN-WLDA projection and SUV PLDA modelling is used to compensate the session and utterance variations. Experimental studies have found that a combination of SN-WLDA and SUV PLDA modelling approach shows an improvement over baseline system (WCCN[LDA]-projected Gaussian PLDA (GPLDA)) as this approach effectively compensates the session and utterance variations.
Resumo:
This PhD research has provided novel solutions to three major challenges which have prevented the wide spread deployment of speaker recognition technology: (1) combating enrolment/ verification mismatch, (2) reducing the large amount of development and training data that is required and (3) reducing the duration of speech required to verify a speaker. A range of applications of speaker recognition technology from forensics in criminal investigations to secure access in banking will benefit from the research outcomes.
Resumo:
This paper describes a novel approach to phonotactic LID, where instead of using soft-counts based on phoneme lattices, we use posteriogram to obtain n-gram counts. The high-dimensional vectors of counts are reduced to low-dimensional units for which we adapted the commonly used term i-vectors. The reduction is based on multinomial subspace modeling and is designed to work in the total-variability space. The proposed technique was tested on the NIST 2009 LRE set with better results to a system based on using soft-counts (Cavg on 30s: 3.15% vs 3.43%), and with very good results when fused with an acoustic i-vector LID system (Cavg on 30s acoustic 2.4% vs 1.25%). The proposed technique is also compared with another low dimensional projection system based on PCA. In comparison with the original soft-counts, the proposed technique provides better results, reduces the problems due to sparse counts, and avoids the process of using pruning techniques when creating the lattices.
Resumo:
This paper presents a description of our system for the Albayzin 2012 LRE competition. One of the main characteristics of this evaluation was the reduced number of available files for training the system, especially for the empty condition where no training data set was provided but only a development set. In addition, the whole database was created from online videos and around one third of the training data was labeled as noisy files. Our primary system was the fusion of three different i-vector based systems: one acoustic system based on MFCCs, a phonotactic system using trigrams of phone-posteriorgram counts, and another acoustic system based on RPLPs that improved robustness against noise. A contrastive system that included new features based on the glottal source was also presented. Official and postevaluation results for all the conditions using the proposed metrics for the evaluation and the Cavg metric are presented in the paper.
Resumo:
This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg.
Resumo:
Robust speaker verification on short utterances remains a key consideration when deploying automatic speaker recognition, as many real world applications often have access to only limited duration speech data. This paper explores how the recent technologies focused around total variability modeling behave when training and testing utterance lengths are reduced. Results are presented which provide a comparison of Joint Factor Analysis (JFA) and i-vector based systems including various compensation techniques; Within-Class Covariance Normalization (WCCN), LDA, Scatter Difference Nuisance Attribute Projection (SDNAP) and Gaussian Probabilistic Linear Discriminant Analysis (GPLDA). Speaker verification performance for utterances with as little as 2 sec of data taken from the NIST Speaker Recognition Evaluations are presented to provide a clearer picture of the current performance characteristics of these techniques in short utterance conditions.
Resumo:
This paper investigates the effects of limited speech data in the context of speaker verification using a probabilistic linear discriminant analysis (PLDA) approach. Being able to reduce the length of required speech data is important to the development of automatic speaker verification system in real world applications. When sufficient speech is available, previous research has shown that heavy-tailed PLDA (HTPLDA) modeling of speakers in the i-vector space provides state-of-the-art performance, however, the robustness of HTPLDA to the limited speech resources in development, enrolment and verification is an important issue that has not yet been investigated. In this paper, we analyze the speaker verification performance with regards to the duration of utterances used for both speaker evaluation (enrolment and verification) and score normalization and PLDA modeling during development. Two different approaches to total-variability representation are analyzed within the PLDA approach to show improved performance in short-utterance mismatched evaluation conditions and conditions for which insufficient speech resources are available for adequate system development. The results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset suggest that the HTPLDA system can continue to achieve better performance than Gaussian PLDA (GPLDA) as evaluation utterance lengths are decreased. We also highlight the importance of matching durations for score normalization and PLDA modeling to the expected evaluation conditions. Finally, we found that a pooled total-variability approach to PLDA modeling can achieve better performance than the traditional concatenated total-variability approach for short utterances in mismatched evaluation conditions and conditions for which insufficient speech resources are available for adequate system development.
Resumo:
Most existing models of language production and speech motor control do not explicitly address how language requirements affect speech motor functions, as these domains are usually treated as separate and independent from one another. This investigation compared lip movements during bilabial closure between five individuals with mild aphasia and five age and gender-matched control speakers when the linguistic characteristics of the stimuli were varied by increasing the number of syllables. Upper and lower lip movement data were collected for mono-, bi- and tri-syllabic nonword sequences using an AG 100 EMMA system. Each task was performed under both normal and fast rate conditions. Single articulator kinematic parameters (peak velocity, amplitude, duration,and cyclic spatio-temporal index) were measured to characterize lip movements. Results revealed that compared to control speakers, individuals with aphasia showed significantly longer movement duration and lower movement stability for longer items (bi- and tri-syllables). Moreover, utterance length affected the lip kinematics, in that the monosyllables had smaller peak velocities, smaller amplitudes and shorter durations compared to bi- and trisyllables, and movement stability was lowest for the trisyllables. In addition, the rate-induced changes (smaller amplitude and shorter duration with increased rate) were most prominent for the short items (i.e., monosyllables). These findings provide further support for the notion that linguistic changes have an impact on the characteristics of speech movements, and that individuals with aphasia are more affected by such changes than control speakers.
Resumo:
With the fast development of urban sprawl and renewal in China, many buildings are “non-nature” short-lived, i.e. demolished after only a few years. For this concern, this research explores the influencing factors of short-lived buildings and provides the scientific foundation for sustainable urban management and planning. Cases for this research are 1734 buildings demolished in Jiangbei district, the middle region of Chongqing City. Internal and external factors for the short-lived buildings are identified by applying logistic analysis. The results indicate that nine factors have significant influence on short-lived buildings. This research also find that buildings with low density, utilization and compensation while high land development potential are more likely to become short-lived buildings.
Resumo:
‘The enigma of revolts.’ You can almost hear the sigh at the end of this sentence. Foucault is making a statement here, published under the title ‘Useless to Revolt’, on that ‘impulse by which a single individual, a group, a minority, or an entire people says, “I will no longer obey”’. In this short piece, I question the two sides of the enigma – how to label the revolt – is the act of rioting, such as what we witnessed in Ferguson, Missouri in August 2014 ‘proper resistance’ – and, how to understand the ēthos of the rioter. The label of counter-conduct, I argue clarifies the enigma as it allows us, challenges us even, to see the event as political. Counter-conduct provides a new framework for reading spontaneous and improvised forms of political expression. The rioter can then be seen as political and rational, as demonstrating ethical behavior. The ēthos of this behavior is represented as an ethics of the self, a form of parrhēsia where the rioter risks herself and shows courage to tell the truth, the story of her community.
Resumo:
Oferim als estudiants universitaris i als lectors interessats aquesta guia didàctica de la matemàtica universitària com a fruit dels nostres anys de docència de les matemàtiques a la Universitat. El resultat final ha esdevingut una col·lecció de setze petits volums agrupats en els dos mòduls d'Àlgebra Lineal i de Càlcul Infinitesimal. Amb aquest sisè volum de la col•lecció iniciem l’estudi de l’Àlgebra vectorial a partir de conceptes propers a la intuïció com són els vectors del pla i de l’espai per, a continuació, fer una generalització del concepte de vector a altres ens matemàtics com polinomis, successions, magnituds econòmiques, etc. En aquest volum utilitzarem sovint la notació matricial, ja coneguda i emprada en volums anteriors, i que esdevé una eina idònia per facilitar la notació dels conceptes i del càlcul entre vectors. Seguim amb l’estudi axiomàtic de l’estructura d’espai vectorial i les seves propietats, que com veurem en el proper volum ens permetrà, entre altres aplicacions a l’economia, deduir els valors i vectors propis d’un endomorfisme i diagonalitzar formes quadràtiques
Resumo:
The phytopathogen Xylella fastidiosa produces long type IV pili and short type I pili involved in motility and adhesion. In this work, we have investigated the role of sigma factor sigma(54) (RpoN) in the regulation of fimbrial biogenesis in X. fastidiosa. An rpoN null mutant was constructed from the non-pathogenic citrus strain J1a12, and microarray analyses of global gene expression comparing the wild type and rpoN mutant strains showed few genes exhibiting differential expression. In particular, gene pilA1 (XF2542), which encodes the structural pilin protein of type IV pili, showed decreased expression in the rpoN mutant, whereas two-fold higher expression of an operon encoding proteins of type I pili was detected, as confirmed by quantitative RT-PCR (qRT-PCR) analysis. The transcriptional start site of pilA1 was determined by primer extension, downstream of a sigma(54)-dependent promoter. Microarray and qRT-PCR data demonstrated that expression of only one of the five pilA paralogues, pilA1, was significantly reduced in the rpoN mutant. The rpoN mutant made more biofilm than the wild type strain and presented a cell-cell aggregative phenotype. These results indicate that sigma(54) differentially regulates genes involved in type IV and type I fimbrial biogenesis, and is involved in biofilm formation in X. fastidiosa.
Resumo:
Real-world AI systems have been recently deployed which can automatically analyze the plan and tactics of tennis players. As the game-state is updated regularly at short intervals (i.e. point-level), a library of successful and unsuccessful plans of a player can be learnt over time. Given the relative strengths and weaknesses of a player’s plans, a set of proven plans or tactics from the library that characterize a player can be identified. For low-scoring, continuous team sports like soccer, such analysis for multi-agent teams does not exist as the game is not segmented into “discretized” plays (i.e. plans), making it difficult to obtain a library that characterizes a team’s behavior. Additionally, as player tracking data is costly and difficult to obtain, we only have partial team tracings in the form of ball actions which makes this problem even more difficult. In this paper, we propose a method to overcome these issues by representing team behavior via play-segments, which are spatio-temporal descriptions of ball movement over fixed windows of time. Using these representations we can characterize team behavior from entropy maps, which give a measure of predictability of team behaviors across the field. We show the efficacy and applicability of our method on the 2010-2011 English Premier League soccer data.