184 resultados para Utterance
Resumo:
This paper presents a novel approach of estimating the confidence interval of speaker verification scores. This approach is utilised to minimise the utterance lengths required in order to produce a confident verification decision. The confidence estimation method is also extended to address both the problem of high correlation in consecutive frame scores, and robustness with very limited training samples. The proposed technique achieves a drastic reduction in the typical data requirements for producing confident decisions in an automatic speaker verification system. When evaluated on the NIST 2005 SRE, the early verification decision method demonstrates that an average of 5–10 seconds of speech is sufficient to produce verification rates approaching those achieved previously using an average in excess of 100 seconds of speech.
Resumo:
A significant amount of speech is typically required for speaker verification system development and evaluation, especially in the presence of large intersession variability. This paper introduces a source and utterance duration normalized linear discriminant analysis (SUN-LDA) approaches to compensate session variability in short-utterance i-vector speaker verification systems. Two variations of SUN-LDA are proposed where normalization techniques are used to capture source variation from both short and full-length development i-vectors, one based upon pooling (SUN-LDA-pooled) and the other on concatenation (SUN-LDA-concat) across the duration and source-dependent session variation. Both the SUN-LDA-pooled and SUN-LDA-concat techniques are shown to provide improvement over traditional LDA on NIST 08 truncated 10sec-10sec evaluation conditions, with the highest improvement obtained with the SUN-LDA-concat technique achieving a relative improvement of 8% in EER for mis-matched conditions and over 3% for matched conditions over traditional LDA approaches.
Resumo:
This paper proposes techniques to improve the performance of i-vector based speaker verification systems when only short utterances are available. Short-length utterance i-vectors vary with speaker, session variations, and the phonetic content of the utterance. Well established methods such as linear discriminant analysis (LDA), source-normalized LDA (SN-LDA) and within-class covariance normalisation (WCCN) exist for compensating the session variation but we have identified the variability introduced by phonetic content due to utterance variation as an additional source of degradation when short-duration utterances are used. To compensate for utterance variations in short i-vector speaker verification systems using cosine similarity scoring (CSS), we have introduced a short utterance variance normalization (SUVN) technique and a short utterance variance (SUV) modelling approach at the i-vector feature level. A combination of SUVN with LDA and SN-LDA is proposed to compensate the session and utterance variations and is shown to provide improvement in performance over the traditional approach of using LDA and/or SN-LDA followed by WCCN. An alternative approach is also introduced using probabilistic linear discriminant analysis (PLDA) approach to directly model the SUV. The combination of SUVN, LDA and SN-LDA followed by SUV PLDA modelling provides an improvement over the baseline PLDA approach. We also show that for this combination of techniques, the utterance variation information needs to be artificially added to full-length i-vectors for PLDA modelling.
Resumo:
This paper proposes a combination of source-normalized weighted linear discriminant analysis (SN-WLDA) and short utterance variance (SUV) PLDA modelling to improve the short utterance PLDA speaker verification. As short-length utterance i-vectors vary with the speaker, session variations and phonetic content of the utterance (utterance variation), a combined approach of SN-WLDA projection and SUV PLDA modelling is used to compensate the session and utterance variations. Experimental studies have found that a combination of SN-WLDA and SUV PLDA modelling approach shows an improvement over baseline system (WCCN[LDA]-projected Gaussian PLDA (GPLDA)) as this approach effectively compensates the session and utterance variations.
Resumo:
The aim of this study was to examine the applicability of the Phonological Mean Length of Utterance (pMLU) method to the data of children acquiring Finnish, for both typically developing children and children with a Specific Language Impairment (SLI). Study I examined typically developing children at the end of the one-word stage (N=17, mean age 1;8), and Study II analysed children s (N=5) productions in a follow-up study with four assessment points (ages 2;0, 2;6, 3;0, 3;6). Study III was carried out in the form of a review article that examined recent research on the phonological development of children acquiring Finnish and compared the results with general trends and cross-linguistic findings in phonological development. Study IV included children with SLI (N=4, mean age 4;10) and age-matched peers. The analyses in Studies I, II and IV were made using the quantitative pMLU method. In the pMLU method, pMLU values are counted for both the words that the children targeted (so-called target words) and the words produced by the children. When the child s average pMLU value was divided with the average target word pMLU value, it is possible to examine that child s accuracy in producing the words with the Whole-Word Proximity (PWP) value. In addition, the number of entirely correctly produced words is counted to obtain the Whole-Word Correctness (PWC) value. Qualitative analyses were carried out in order to examine how the children s phoneme inventories and deficiencies in phonotactics would explain the observed pMLU, PWP and PWC values. The results showed that the pMLU values for children acquiring Finnish were relatively high already at the end of the one-word stage (Study I). The values were found to reflect the characteristics of the ambient language. Typological features that lead to cross-linguistic differences in pMLU values were also observed in the review article (Study III), which noted that in the course of phonological acquisition there are a large number of language-specific phenomena and processes. Study II indicated that overall the children s phonological development during the follow-up period was reflected in the pMLU, PWP and PWC values, although the method showed limitations in detecting qualitative differences between the children. Correct vowels were not scored in the pMLU counts, which led to some misleadingly high pMLU and PWP results: vowel errors were only reflected in the PWC values. Typically developing children in Study II reached the highest possible pMLU results already around age 3;6. At the same time, the differences between the children with SLI and age-matched peers in the pMLU values were very prominent (Study IV). The values for the children with SLI were similar to the ones reported for two-year-old children. Qualitative analyses revealed that the phonologies of the children with SLI largely resembled the ones of younger, typically developing children. However, unusual errors were also witnessed (e.g., vowel errors, omissions of word-initial stops, consonants added to the initial position in words beginning with a vowel). This dissertation provides an application of a new tool for quantitative phonological assessment and analysis in children acquiring Finnish. The preliminary results suggest that, with some modifications, the pMLU method can be used to assess children s phonological development and that it has some advantages compared to the earlier, segment-oriented approaches. Qualitative analyses complemented the pMLU s observations on the children s phonologies. More research is needed in order to verify the levels of the pMLU, PWP and PWC values in children acquiring Finnish.
Resumo:
Most existing models of language production and speech motor control do not explicitly address how language requirements affect speech motor functions, as these domains are usually treated as separate and independent from one another. This investigation compared lip movements during bilabial closure between five individuals with mild aphasia and five age and gender-matched control speakers when the linguistic characteristics of the stimuli were varied by increasing the number of syllables. Upper and lower lip movement data were collected for mono-, bi- and tri-syllabic nonword sequences using an AG 100 EMMA system. Each task was performed under both normal and fast rate conditions. Single articulator kinematic parameters (peak velocity, amplitude, duration,and cyclic spatio-temporal index) were measured to characterize lip movements. Results revealed that compared to control speakers, individuals with aphasia showed significantly longer movement duration and lower movement stability for longer items (bi- and tri-syllables). Moreover, utterance length affected the lip kinematics, in that the monosyllables had smaller peak velocities, smaller amplitudes and shorter durations compared to bi- and trisyllables, and movement stability was lowest for the trisyllables. In addition, the rate-induced changes (smaller amplitude and shorter duration with increased rate) were most prominent for the short items (i.e., monosyllables). These findings provide further support for the notion that linguistic changes have an impact on the characteristics of speech movements, and that individuals with aphasia are more affected by such changes than control speakers.
Resumo:
Utterance of a sentence in poetry can be performative, and explicitly so. The best-known of Geoffrey Hill’s critical essays denies this, but his own poetry demonstrates it. I clarify these claims and explain why they matter. What Hill denies illuminates anxieties about responsibility and commitment that poets and critics share with philosophers. What Hill demonstrates affords opportunities for mutual benefit between philosophy and criticism.
Resumo:
Mode of access: Internet.
Resumo:
Publisher's advertisements on back cover.
Resumo:
This work presents an extended Joint Factor Analysis model including explicit modelling of unwanted within-session variability. The goals of the proposed extended JFA model are to improve verification performance with short utterances by compensating for the effects of limited or imbalanced phonetic coverage, and to produce a flexible JFA model that is effective over a wide range of utterance lengths without adjusting model parameters such as retraining session subspaces. Experimental results on the 2006 NIST SRE corpus demonstrate the flexibility of the proposed model by providing competitive results over a wide range of utterance lengths without retraining and also yielding modest improvements in a number of conditions over current state-of-the-art.
Resumo:
Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.