947 resultados para utterance length
Resumo:
This paper presents a novel approach of estimating the confidence interval of speaker verification scores. This approach is utilised to minimise the utterance lengths required in order to produce a confident verification decision. The confidence estimation method is also extended to address both the problem of high correlation in consecutive frame scores, and robustness with very limited training samples. The proposed technique achieves a drastic reduction in the typical data requirements for producing confident decisions in an automatic speaker verification system. When evaluated on the NIST 2005 SRE, the early verification decision method demonstrates that an average of 5–10 seconds of speech is sufficient to produce verification rates approaching those achieved previously using an average in excess of 100 seconds of speech.
Resumo:
Most existing models of language production and speech motor control do not explicitly address how language requirements affect speech motor functions, as these domains are usually treated as separate and independent from one another. This investigation compared lip movements during bilabial closure between five individuals with mild aphasia and five age and gender-matched control speakers when the linguistic characteristics of the stimuli were varied by increasing the number of syllables. Upper and lower lip movement data were collected for mono-, bi- and tri-syllabic nonword sequences using an AG 100 EMMA system. Each task was performed under both normal and fast rate conditions. Single articulator kinematic parameters (peak velocity, amplitude, duration,and cyclic spatio-temporal index) were measured to characterize lip movements. Results revealed that compared to control speakers, individuals with aphasia showed significantly longer movement duration and lower movement stability for longer items (bi- and tri-syllables). Moreover, utterance length affected the lip kinematics, in that the monosyllables had smaller peak velocities, smaller amplitudes and shorter durations compared to bi- and trisyllables, and movement stability was lowest for the trisyllables. In addition, the rate-induced changes (smaller amplitude and shorter duration with increased rate) were most prominent for the short items (i.e., monosyllables). These findings provide further support for the notion that linguistic changes have an impact on the characteristics of speech movements, and that individuals with aphasia are more affected by such changes than control speakers.
Resumo:
This thesis presents an experimental study of the speech prosody of identical and non-identical twins. Speech fluency, pauses, speech rate, utterance length and speech frequency were examined phonetically, auditorily, semantically and statistically. The methods included both reading tasks (reading the alphabet, numerical lists, sentences with foreign loan words, holiday theme questions as well as 1.5 pages of text with long sentences and complex words) and spontaneous speech tasks (picture description and answering holiday theme questions). The subjects were Finnish-speaking 22-28-year-old female twins: 8 identical (monozygotic) and 10 non-identical (dizygotic) pairs. One pair was male-female. Comparisons were made between twin groups and between sisters. The data was regathered from four twin pairs, to make it possible to investigate some subjects intra-individually. In addition phoneticians, phonetic students and people without knowledge of phonetic science were tested in two listening experiments. The results showed that the dizygotic twins differed more from each other than monozygotic twins and that monozygotic twin sisters shared more similarities than dizygotic twin sisters. For example, between monozygotic twin sisters smaller differences were found between word count, utterance length and speech rate in spontaneous speech tasks. Dizygotic twin sisters made more different kinds of reading mistakes with the same target words than monozygotic twin sisters, while monozygotic twin sisters made more of the same reading mistakes with the same target words than dizygotic twin sisters. The listening experiments showed that only professional phoneticians were able to recognize the twin sisters. Even though the twins had the possibility to freely choose their speech rate, pausing and speech frequency, they used their own speech patterns; these included the same average speech frequency, average speech rate, type of pausing routine or filled pauses, and other speech mannerisms throughout their speech.
Resumo:
Brown (1973) a proposé la « longueur moyenne des énoncés » (LME) comme indice standard du développement langagier. La LME se calcule selon le nombre moyen de morphèmes dans 100 énoncés de parole spontanée. L’hypothèse sous-jacente à cet indice est que la complexité syntaxique croît avec le nombre de morphèmes dans les énoncés. Selon Brown, l’indice permet d’estimer le développement d’une « compétence grammaticale » jusqu’à environ quatre morphèmes. Certains auteurs ont toutefois critiqué le manque de fiabilité de la LME et la limite de quatre morphèmes. Des rapports démontrent des variations de la LME avec l’âge, ce qui suggère que des facteurs comme la croissance des capacités respiratoires peuvent influencer l’indice de Brown. La présente étude fait état de ces problèmes et examine comment la LME et certaines mesures de diversité lexicale varient selon le développement des capacités respiratoires. On a calculé la LME et la diversité lexicale dans la parole spontanée de 50 locuteurs mâles âgés de 5 à 27 ans. On a également mesuré, au moyen d’un pneumotachographe, la capacité vitale (CV) des locuteurs. Les résultats démontrent que la LME et des mesures de diversité lexicale corrèlent fortement avec la croissance de la CV. Ainsi, la croissance des fonctions respiratoires contraint le développement morphosyntaxique et lexical. Notre discussion fait valoir la nécessité de réévaluer l’indice de la LME et la conception « linguistique » du développement langagier comme une compétence mentale qui émerge séparément de la croissance des structures de performance.
Resumo:
La présente étude porte sur les effets de la familiarité dans l’identification d’individus en situation de parade vocale. La parade vocale est une technique inspirée d’une procédure paralégale d’identification visuelle d’individus. Elle consiste en la présentation de plusieurs voix avec des aspects acoustiques similaires définis selon des critères reconnus dans la littérature. L’objectif principal de la présente étude était de déterminer si la familiarité d’une voix dans une parade vocale peut donner un haut taux d’identification correcte (> 99 %) de locuteurs. Cette étude est la première à quantifier le critère de familiarité entre l’identificateur et une personne associée à « une voix-cible » selon quatre paramètres liés aux contacts (communications) entre les individus, soit la récence du contact (à quand remonte la dernière rencontre avec l’individu), la durée et la fréquence moyenne du contact et la période pendant laquelle avaient lieu les contacts. Trois différentes parades vocales ont été élaborées, chacune contenant 10 voix d’hommes incluant une voix-cible pouvant être très familière; ce degré de familiarité a été établi selon un questionnaire. Les participants (identificateurs, n = 44) ont été sélectionnés selon leur niveau de familiarité avec la voix-cible. Toutes les voix étaient celles de locuteurs natifs du franco-québécois et toutes avaient des fréquences fondamentales moyennes similaires à la voix-cible (à un semi-ton près). Aussi, chaque parade vocale contenait des énoncés variant en longueur selon un nombre donné de syllabes (1, 4, 10, 18 syll.). Les résultats démontrent qu’en contrôlant le degré de familiarité et avec un énoncé de 4 syllabes ou plus, on obtient un taux d’identification avec une probabilité exacte d’erreur de p < 1 x 10-12. Ces taux d’identification dépassent ceux obtenus actuellement avec des systèmes automatisés.
Resumo:
The aim of this study was to examine the applicability of the Phonological Mean Length of Utterance (pMLU) method to the data of children acquiring Finnish, for both typically developing children and children with a Specific Language Impairment (SLI). Study I examined typically developing children at the end of the one-word stage (N=17, mean age 1;8), and Study II analysed children s (N=5) productions in a follow-up study with four assessment points (ages 2;0, 2;6, 3;0, 3;6). Study III was carried out in the form of a review article that examined recent research on the phonological development of children acquiring Finnish and compared the results with general trends and cross-linguistic findings in phonological development. Study IV included children with SLI (N=4, mean age 4;10) and age-matched peers. The analyses in Studies I, II and IV were made using the quantitative pMLU method. In the pMLU method, pMLU values are counted for both the words that the children targeted (so-called target words) and the words produced by the children. When the child s average pMLU value was divided with the average target word pMLU value, it is possible to examine that child s accuracy in producing the words with the Whole-Word Proximity (PWP) value. In addition, the number of entirely correctly produced words is counted to obtain the Whole-Word Correctness (PWC) value. Qualitative analyses were carried out in order to examine how the children s phoneme inventories and deficiencies in phonotactics would explain the observed pMLU, PWP and PWC values. The results showed that the pMLU values for children acquiring Finnish were relatively high already at the end of the one-word stage (Study I). The values were found to reflect the characteristics of the ambient language. Typological features that lead to cross-linguistic differences in pMLU values were also observed in the review article (Study III), which noted that in the course of phonological acquisition there are a large number of language-specific phenomena and processes. Study II indicated that overall the children s phonological development during the follow-up period was reflected in the pMLU, PWP and PWC values, although the method showed limitations in detecting qualitative differences between the children. Correct vowels were not scored in the pMLU counts, which led to some misleadingly high pMLU and PWP results: vowel errors were only reflected in the PWC values. Typically developing children in Study II reached the highest possible pMLU results already around age 3;6. At the same time, the differences between the children with SLI and age-matched peers in the pMLU values were very prominent (Study IV). The values for the children with SLI were similar to the ones reported for two-year-old children. Qualitative analyses revealed that the phonologies of the children with SLI largely resembled the ones of younger, typically developing children. However, unusual errors were also witnessed (e.g., vowel errors, omissions of word-initial stops, consonants added to the initial position in words beginning with a vowel). This dissertation provides an application of a new tool for quantitative phonological assessment and analysis in children acquiring Finnish. The preliminary results suggest that, with some modifications, the pMLU method can be used to assess children s phonological development and that it has some advantages compared to the earlier, segment-oriented approaches. Qualitative analyses complemented the pMLU s observations on the children s phonologies. More research is needed in order to verify the levels of the pMLU, PWP and PWC values in children acquiring Finnish.
Resumo:
A significant amount of speech is typically required for speaker verification system development and evaluation, especially in the presence of large intersession variability. This paper introduces a source and utterance duration normalized linear discriminant analysis (SUN-LDA) approaches to compensate session variability in short-utterance i-vector speaker verification systems. Two variations of SUN-LDA are proposed where normalization techniques are used to capture source variation from both short and full-length development i-vectors, one based upon pooling (SUN-LDA-pooled) and the other on concatenation (SUN-LDA-concat) across the duration and source-dependent session variation. Both the SUN-LDA-pooled and SUN-LDA-concat techniques are shown to provide improvement over traditional LDA on NIST 08 truncated 10sec-10sec evaluation conditions, with the highest improvement obtained with the SUN-LDA-concat technique achieving a relative improvement of 8% in EER for mis-matched conditions and over 3% for matched conditions over traditional LDA approaches.
Resumo:
This paper proposes techniques to improve the performance of i-vector based speaker verification systems when only short utterances are available. Short-length utterance i-vectors vary with speaker, session variations, and the phonetic content of the utterance. Well established methods such as linear discriminant analysis (LDA), source-normalized LDA (SN-LDA) and within-class covariance normalisation (WCCN) exist for compensating the session variation but we have identified the variability introduced by phonetic content due to utterance variation as an additional source of degradation when short-duration utterances are used. To compensate for utterance variations in short i-vector speaker verification systems using cosine similarity scoring (CSS), we have introduced a short utterance variance normalization (SUVN) technique and a short utterance variance (SUV) modelling approach at the i-vector feature level. A combination of SUVN with LDA and SN-LDA is proposed to compensate the session and utterance variations and is shown to provide improvement in performance over the traditional approach of using LDA and/or SN-LDA followed by WCCN. An alternative approach is also introduced using probabilistic linear discriminant analysis (PLDA) approach to directly model the SUV. The combination of SUVN, LDA and SN-LDA followed by SUV PLDA modelling provides an improvement over the baseline PLDA approach. We also show that for this combination of techniques, the utterance variation information needs to be artificially added to full-length i-vectors for PLDA modelling.
Resumo:
This paper proposes a combination of source-normalized weighted linear discriminant analysis (SN-WLDA) and short utterance variance (SUV) PLDA modelling to improve the short utterance PLDA speaker verification. As short-length utterance i-vectors vary with the speaker, session variations and phonetic content of the utterance (utterance variation), a combined approach of SN-WLDA projection and SUV PLDA modelling is used to compensate the session and utterance variations. Experimental studies have found that a combination of SN-WLDA and SUV PLDA modelling approach shows an improvement over baseline system (WCCN[LDA]-projected Gaussian PLDA (GPLDA)) as this approach effectively compensates the session and utterance variations.
Resumo:
This article explores how adult paid work is portrayed in 'family' feature length films. The study extends previous critical media literature which has overwhelmingly focused on depictions of gender and violence, exploring the visual content of films that is relevant to adult employment. Forty-two G/PG films were analyzed for relevant themes. Consistent with the exploratory nature of the research, themes emerged inductively from the films' content. Results reveal six major themes: males are more visible in adult work roles than women; the division of labour remains gendered; work and home are not mutually exclusive domains; organizational authority and power is wielded in punitive ways; there are avenues to better employment prospects; and status/money is paramount. The findings of the study reflect a range of subject matters related to occupational characteristics and work-related communication and interactions which are typically viewed by children in contemporary society.
Resumo:
Aim – To develop and assess the predictive capabilities of a statistical model that relates routinely collected Trauma Injury Severity Score (TRISS) variables to length of hospital stay (LOS) in survivors of traumatic injury. Method – Retrospective cohort study of adults who sustained a serious traumatic injury, and who survived until discharge from Auckland City, Middlemore, Waikato, or North Shore Hospitals between 2002 and 2006. Cubic-root transformed LOS was analysed using two-level mixed-effects regression models. Results – 1498 eligible patients were identified, 1446 (97%) injured from a blunt mechanism and 52 (3%) from a penetrating mechanism. For blunt mechanism trauma, 1096 (76%) were male, average age was 37 years (range: 15-94 years), and LOS and TRISS score information was available for 1362 patients. Spearman’s correlation and the median absolute prediction error between LOS and the original TRISS model was ρ=0.31 and 10.8 days, respectively, and between LOS and the final multivariable two-level mixed-effects regression model was ρ=0.38 and 6.0 days, respectively. Insufficient data were available for the analysis of penetrating mechanism models. Conclusions – Neither the original TRISS model nor the refined model has sufficient ability to accurately or reliably predict LOS. Additional predictor variables for LOS and other indicators for morbidity need to be considered.