242 resultados para utterances


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Robust speaker verification on short utterances remains a key consideration when deploying automatic speaker recognition, as many real world applications often have access to only limited duration speech data. This paper explores how the recent technologies focused around total variability modeling behave when training and testing utterance lengths are reduced. Results are presented which provide a comparison of Joint Factor Analysis (JFA) and i-vector based systems including various compensation techniques; Within-Class Covariance Normalization (WCCN), LDA, Scatter Difference Nuisance Attribute Projection (SDNAP) and Gaussian Probabilistic Linear Discriminant Analysis (GPLDA). Speaker verification performance for utterances with as little as 2 sec of data taken from the NIST Speaker Recognition Evaluations are presented to provide a clearer picture of the current performance characteristics of these techniques in short utterance conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper investigates the effects of limited speech data in the context of speaker verification using a probabilistic linear discriminant analysis (PLDA) approach. Being able to reduce the length of required speech data is important to the development of automatic speaker verification system in real world applications. When sufficient speech is available, previous research has shown that heavy-tailed PLDA (HTPLDA) modeling of speakers in the i-vector space provides state-of-the-art performance, however, the robustness of HTPLDA to the limited speech resources in development, enrolment and verification is an important issue that has not yet been investigated. In this paper, we analyze the speaker verification performance with regards to the duration of utterances used for both speaker evaluation (enrolment and verification) and score normalization and PLDA modeling during development. Two different approaches to total-variability representation are analyzed within the PLDA approach to show improved performance in short-utterance mismatched evaluation conditions and conditions for which insufficient speech resources are available for adequate system development. The results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset suggest that the HTPLDA system can continue to achieve better performance than Gaussian PLDA (GPLDA) as evaluation utterance lengths are decreased. We also highlight the importance of matching durations for score normalization and PLDA modeling to the expected evaluation conditions. Finally, we found that a pooled total-variability approach to PLDA modeling can achieve better performance than the traditional concatenated total-variability approach for short utterances in mismatched evaluation conditions and conditions for which insufficient speech resources are available for adequate system development.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents results from a study on the production of Finnish prosody. The effect of word order and the tonal shape in the production of Finnish prosody was studied as produced by 8 native Finnish speakers. Predictions formulated with regard to results from an earlier study pertaining to the perception of promi- nence were tested. These predictions had to do with the tonal shape of the utterances in the form of a flat hat pattern and the effect of word order on the so called top-line declination within an adver- bial phrase in the utterances. The results from the experiment give support to the following claims: the temporal domain of prosodic focus is the whole utterance, word order reversal from unmarked to marked has an effect on the production of prosody, and the pro- duction of the tonal aspects of focus in Finnish follows a basic flat hat pattern. That is the prominence of a word can be produced by an f 0 rise or a fall, depending on the location of the word in an utterance. The basic accentual shape of a Finnish word is then not a pointed rise/fall hat shape as claimed before since it can vary depending on the syllable structure and the position within an ut- terance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We show that children’s syntactic production is immediately affected by individual experiences of structures and verb–structure pairings within a dialogue, but that these effects have different timecourses. In a picture-matching game, three- to four-year-olds were more likely to describe a transitive action using a passive immediately after hearing the experimenter produce a passive than an active (abstract priming), and this tendency was stronger when the verb was repeated (lexical boost). The lexical boost disappeared after two intervening utterances, but the abstract priming effect persisted. This pattern did not differ significantly from control adults. Children also showed a cumulative priming effect. Our results suggest that whereas the same mechanism may underlie children’s immediate syntactic priming and long-term syntactic learning, different mechanisms underlie the lexical boost versus long-term learning of verb–structure links. They also suggest broad continuity of syntactic processing in production between this age group and adults.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We frequently experience and successfully process anomalous utterances. Here we examine whether people do this by ‘correcting’ syntactic anomalies to yield well-formed representations. In two structural priming experiments, participants’ syntactic choices in picture description were influenced as strongly by previously comprehended anomalous (missing-verb) prime sentences as by well-formed prime sentences. Our results suggest that comprehenders can reconstruct the constituent structure of anomalous utterances – even when such utterances lack a major structural component such as the verb. These results also imply that structural alignment in dialogue is unaffected if one interlocutor produces anomalous utterances.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Speech recognition in car environments has been identified as a valuable means for reducing driver distraction when operating non-critical in-car systems. Likelihood-maximising (LIMA) frameworks optimise speech enhancement algorithms based on recognised state sequences rather than traditional signal-level criteria such as maximising signal-to-noise ratio. Previously presented LIMA frameworks require calibration utterances to generate optimised enhancement parameters which are used for all subsequent utterances. Sub-optimal recognition performance occurs in noise conditions which are significantly different from that present during the calibration session - a serious problem in rapidly changing noise environments. We propose a dialog-based design which allows regular optimisation iterations in order to track the changing noise conditions. Experiments using Mel-filterbank spectral subtraction are performed to determine the optimisation requirements for vehicular environments and show that minimal optimisation assists real-time operation with improved speech recognition accuracy. It is also shown that the proposed design is able to provide improved recognition performance over frameworks incorporating a calibration session.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This work presents an extended Joint Factor Analysis model including explicit modelling of unwanted within-session variability. The goals of the proposed extended JFA model are to improve verification performance with short utterances by compensating for the effects of limited or imbalanced phonetic coverage, and to produce a flexible JFA model that is effective over a wide range of utterance lengths without adjusting model parameters such as retraining session subspaces. Experimental results on the 2006 NIST SRE corpus demonstrate the flexibility of the proposed model by providing competitive results over a wide range of utterance lengths without retraining and also yielding modest improvements in a number of conditions over current state-of-the-art.