988 resultados para Speech Rate
Resumo:
Acoustic feature based speech (syllable) rate estimation and syllable nuclei detection are important problems in automatic speech recognition (ASR), computer assisted language learning (CALL) and fluency analysis. A typical solution for both the problems consists of two stages. The first stage involves computing a short-time feature contour such that most of the peaks of the contour correspond to the syllabic nuclei. In the second stage, the peaks corresponding to the syllable nuclei are detected. In this work, instead of the peak detection, we perform a mode-shape classification, which is formulated as a supervised binary classification problem - mode-shapes representing the syllabic nuclei as one class and remaining as the other. We use the temporal correlation and selected sub-band correlation (TCSSBC) feature contour and the mode-shapes in the TCSSBC feature contour are converted into a set of feature vectors using an interpolation technique. A support vector machine classifier is used for the classification. Experiments are performed separately using Switchboard, TIMIT and CTIMIT corpora in a five-fold cross validation setup. The average correlation coefficients for the syllable rate estimation turn out to be 0.6761, 0.6928 and 0.3604 for three corpora respectively, which outperform those obtained by the best of the existing peak detection techniques. Similarly, the average F-scores (syllable level) for the syllable nuclei detection are 0.8917, 0.8200 and 0.7637 for three corpora respectively. (C) 2016 Elsevier B.V. All rights reserved.
Resumo:
How speech is separated perceptually from other speech remains poorly understood. Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the modulation of its frequency, but not its amplitude, contour. This study further examined the effect of formant-frequency variation on intelligibility by manipulating the rate of formant-frequency change. Target sentences were synthetic three-formant (F1?+?F2?+?F3) analogues of natural utterances. Perceptual organization was probed by presenting stimuli dichotically (F1?+?F2C?+?F3C; F2?+?F3), where F2C?+?F3C constitute a competitor for F2 and F3 that listeners must reject to optimize recognition. Competitors were derived using formant-frequency contours extracted from extended passages spoken by the same talker and processed to alter the rate of formant-frequency variation, such that rate scale factors relative to the target sentences were 0, 0.25, 0.5, 1, 2, and 4 (0?=?constant frequencies). Competitor amplitude contours were either constant, or time-reversed and rate-adjusted in parallel with the frequency contour. Adding a competitor typically reduced intelligibility; this reduction increased with competitor rate until the rate was at least twice that of the target sentences. Similarity in the results for the two amplitude conditions confirmed that formant amplitude contours do not influence across-formant grouping. The findings indicate that competitor efficacy is not tuned to the rate of the target sentences; most probably, it depends primarily on the overall rate of frequency variation in the competitor formants. This suggests that, when segregating the speech of concurrent talkers, differences in speech rate may not be a significant cue for across-frequency grouping of formants.
Resumo:
Background Aphasia is an acquired language disorder that can present a significant barrier to patient involvement in healthcare decisions. Speech-language pathologists (SLPs) are viewed as experts in the field of communication. However, many SLP students do not receive practical training in techniques to communicate with people with aphasia (PWA) until they encounter PWA during clinical education placements. Methods This study investigated the confidence and knowledge of SLP students in communicating with PWA prior to clinical placements using a customised questionnaire. Confidence in communicating with people with aphasia was assessed using a 100-point visual analogue scale. Linear, and logistic, regressions were used to examine the association between confidence and age, as well as confidence and course type (graduate-entry masters or undergraduate), respectively. Knowledge of strategies to assist communication with PWA was examined by asking respondents to list specific strategies that could assist communication with PWA. Results SLP students were not confident with the prospect of communicating with PWA; reporting a median 29-points (inter-quartile range 17–47) on the visual analogue confidence scale. Only, four (8.2%) of respondents rated their confidence greater than 55 (out of 100). Regression analyses indicated no relationship existed between confidence and students‘ age (p = 0.31, r-squared = 0.02), or confidence and course type (p = 0.22, pseudo r-squared = 0.03). Students displayed limited knowledge about communication strategies. Thematic analysis of strategies revealed four overarching themes; Physical, Verbal Communication, Visual Information and Environmental Changes. While most students identified potential use of resources (such as images and written information), fewer students identified strategies to alter their verbal communication (such as reduced speech rate). Conclusions SLP students who had received aphasia related theoretical coursework, but not commenced clinical placements with PWA, were not confident in their ability to communicate with PWA. Students may benefit from an educational intervention or curriculum modification to incorporate practical training in effective strategies to communicate with PWA, before they encounter PWA in clinical settings. Ensuring students have confidence and knowledge of potential communication strategies to assist communication with PWA may allow them to focus their learning experiences in more specific clinical domains, such as clinical reasoning, rather than building foundation interpersonal communication skills.
Resumo:
Speech rhythm is an essential part of speech processing. It is the outcome of the workings of a combination of linguistic and non-linguistic parameters, many of which also have other functions in speech. This study focusses on the acoustic and auditive realization of two linguistic parameters of rhythm: (1) sentence stress, and (2) speech rate and pausing. The aim was to find out how well Finnish comprehensive school pupils realize these two parameters in English and how native speakers of English react to Finnish pupils English rhythm. The material was elicited by means of a story-telling task and questionnaires. Three female and three male pupils representing different levels of oral skills in English were selected as the experimental group. The control group consisted of two female and two male native speakers of English. The stories were analysed acoustically and auditorily with respect to interstress intervals, weak forms, fundamental frequency, pausing, and speech as well as articulation rate. In addition, 52 native speakers of English were asked to rate the intelligibility of the Finnish pupils English with respect to speech rhythm and give their attitudes on what the pupils sounded like. Results showed that Finnish pupils can produce isochronous interstress intervals in English, but that too large a proportion of these intervals contain pauses. A closer analysis of the pauses revealed that Finnish pupils pause too frequently and in inappropriate places when they speak English. Frequent pausing was also found to cause slow speech rates. The findings of the fundamental frequency (F0) measurements indicate that Finnish pupils tend to make a slightly narrower F0 difference between stressed and unstressed syllables than the native speakers of English. Furthermore, Finnish pupils appear to know how to reduce the duration and quality of unstressed sounds, but they fail to do it frequently enough. Native listeners gave lower intelligibility and attitude scores to pupils with more anomalous speech rhythm. Finnish pupils rhythm anomalies seemed to derive from various learning- or learner-related factors rather than from the differences between English and Finnish. This study demonstrates that pausing may be a more important component of English speech rhythm than sentence stress as far as Finnish adolescents are concerned and that interlanguage development is affected by various factors and characterised by jumps or periods of stasis. Other theoretical, methodological and pedagogical implications of the results are also discussed.
Resumo:
Previous investigations employing electropalatography (EPG) have identified articulatory timing deficits in individuals with acquired dysarthria. However, this technology is yet to be applied to the articulatory timing disturbance present in Parkinson's disease (PD). As a result, the current investigation aimed to use EPG to comprehensively examine the temporal aspects of articulation in a group of nine individuals with PD at sentence, word and segment level. This investigation followed on from a prior study (McAuliffe, Ward and Murdoch) and similarly, aimed to compare the results of the participants with PD to a group of aged (n=7) and young controls (n=8) to determine if ageing contributed to any articulatory timing deficits observed. Participants were required to read aloud the phrase I saw a ___ today'' with the EPG palate in-situ. Target words included the consonants /1/, /s/ and /t/ in initial position in both the /i/ and /a/ vowel environments. Perceptual investigation of speech rate was conducted in addition to objective measurement of sentence, word and segment duration. Segment durations included the total segment length and duration of the approach, closure/constriction and release phases of EPG consonant production. Results of the present study revealed impaired speech rate, perceptually, in the group with PD. However, this was not confirmed objectively. Electropalatographic investigation of segment durations indicated that, in general, the group with PD demonstrated segment durations consistent with the control groups. Only one significant difference was noted, with the group with PD exhibiting significantly increased duration of the release phase for /1a/ when compared to both the control groups. It is, therefore, possible that EPG failed to detect lingual movement impairment as it does not measure the complete tongue movement towards and away from the hard palate. Furthermore, the contribution of individual variation to the present findings should not be overlooked.
Resumo:
How speech is separated perceptually from other speech remains poorly understood. In a series of experiments, perceptual organisation was probed by presenting three-formant (F1+F2+F3) analogues of target sentences dichotically, together with a competitor for F2 (F2C), or for F2+F3, which listeners must reject to optimise recognition. To control for energetic masking, the competitor was always presented in the opposite ear to the corresponding target formant(s). Sine-wave speech was used initially, and different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, whatever their amplitude characteristics, whereas constant-frequency F2Cs were ineffective. Subsequent studies used synthetic-formant speech to explore the effects of manipulating the rate and depth of formant-frequency change in the competitor. Competitor efficacy was not tuned to the rate of formant-frequency variation in the target sentences; rather, the reduction in intelligibility increased with competitor rate relative to the rate for the target sentences. Therefore, differences in speech rate may not be a useful cue for separating the speech of concurrent talkers. Effects of competitors whose depth of formant-frequency variation was scaled by a range of factors were explored using competitors derived either by inverting the frequency contour of F2 about its geometric mean (plausibly speech-like pattern) or by using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Competitor efficacy depended on the overall depth of frequency variation, not depth relative to that for the other formants. Furthermore, the triangle-wave competitors were as effective as their more speech-like counterparts. Overall, the results suggest that formant-frequency variation is critical for the across-frequency grouping of formants but that this grouping does not depend on speech-specific constraints.
Resumo:
How speech is separated perceptually from other speech remains poorly understood. In a series of experiments, perceptual organisation was probed by presenting three-formant (F1+F2+F3) analogues of target sentences dichotically, together with a competitor for F2 (F2C), or for F2+F3, which listeners must reject to optimise recognition. To control for energetic masking, the competitor was always presented in the opposite ear to the corresponding target formant(s). Sine-wave speech was used initially, and different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, whatever their amplitude characteristics, whereas constant-frequency F2Cs were ineffective. Subsequent studies used synthetic-formant speech to explore the effects of manipulating the rate and depth of formant-frequency change in the competitor. Competitor efficacy was not tuned to the rate of formant-frequency variation in the target sentences; rather, the reduction in intelligibility increased with competitor rate relative to the rate for the target sentences. Therefore, differences in speech rate may not be a useful cue for separating the speech of concurrent talkers. Effects of competitors whose depth of formant-frequency variation was scaled by a range of factors were explored using competitors derived either by inverting the frequency contour of F2 about its geometric mean (plausibly speech-like pattern) or by using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Competitor efficacy depended on the overall depth of frequency variation, not depth relative to that for the other formants. Furthermore, the triangle-wave competitors were as effective as their more speech-like counterparts. Overall, the results suggest that formant-frequency variation is critical for the across-frequency grouping of formants but that this grouping does not depend on speech-specific constraints. © Springer Science+Business Media New York 2013.
Resumo:
This thesis presents an experimental study of the speech prosody of identical and non-identical twins. Speech fluency, pauses, speech rate, utterance length and speech frequency were examined phonetically, auditorily, semantically and statistically. The methods included both reading tasks (reading the alphabet, numerical lists, sentences with foreign loan words, holiday theme questions as well as 1.5 pages of text with long sentences and complex words) and spontaneous speech tasks (picture description and answering holiday theme questions). The subjects were Finnish-speaking 22-28-year-old female twins: 8 identical (monozygotic) and 10 non-identical (dizygotic) pairs. One pair was male-female. Comparisons were made between twin groups and between sisters. The data was regathered from four twin pairs, to make it possible to investigate some subjects intra-individually. In addition phoneticians, phonetic students and people without knowledge of phonetic science were tested in two listening experiments. The results showed that the dizygotic twins differed more from each other than monozygotic twins and that monozygotic twin sisters shared more similarities than dizygotic twin sisters. For example, between monozygotic twin sisters smaller differences were found between word count, utterance length and speech rate in spontaneous speech tasks. Dizygotic twin sisters made more different kinds of reading mistakes with the same target words than monozygotic twin sisters, while monozygotic twin sisters made more of the same reading mistakes with the same target words than dizygotic twin sisters. The listening experiments showed that only professional phoneticians were able to recognize the twin sisters. Even though the twins had the possibility to freely choose their speech rate, pausing and speech frequency, they used their own speech patterns; these included the same average speech frequency, average speech rate, type of pausing routine or filled pauses, and other speech mannerisms throughout their speech.
Resumo:
Voice alarm plays an important role in emergency evacuation of public place, because it can provide information and instruct evacuation. This paper studied the optimization of acoustic and semantic parameters of voice alarms in emergency evacuation, so that alarm design can improve the evacuation performance. Both method of magnitude estimation and scale were implemented to investigate participants' perceived urgency of the alarms with different parameters. The results indicated that, participants evaluated the alarms with faster speech rate, with greater signal to noise ratio (SNR) and under louder noises more urgent. There was an interaction between noise level and content of voice alarm. Signals with speech rate below 4 characters / second were evaluated as non urgent at all. Intelligibility of the voice alarm was investigated by evaluating the key pointed recognition performance. The results showed that, speech rate’s effect was a marginal significance, and 7 characters / second has the highest intelligibility. It might because that the faster the signal spoken, the more attention was paid. Gender of speaker and SNR did not have a significant effect on the signals’ intelligibility. This paper also investigated impact of voice alarms' content on human behavior in emergency evacuation in a 3-D virtual reality environment. In condition of "telling the occupants what had happened and what to do", the number of participants who succeeded in evacuation was the largest. Further study, in which similar numbers of participants evacuate successfully in three conditions, indicated that the reaction time and evacuation time was the shortest in the aforesaid condition. Although one-way ANOVA shows that the difference was not significant, the results still provided some reference to the alarm design. In sum, parameters of voice alarm in emergency evacuation should be chosen to meet needs from both perceived urgency and intelligibility. Contents of the alarms should include "what had happened and what to do", and should vary according to noise levels in different public places.
Resumo:
Objective. To compare the voice performance of children involved in street labor with regular children using perceptual-auditory and acoustic analyses.Methods. A controlled cross-sectional study was carried out on 7- to 10-year-old children of both genders. Children from both groups lived with their families and attended school regularly; however, child labor was evident in one group and not the other. A total of 200 potentially eligible street children, assisted by the Child Labor Elimination Programme (PETI), and 400 regular children were interviewed. Those with any vocal discomfort (106, 53% and 90, 22.5%) had their voices assessed for resonance, pitch, loudness, speech rate, maximum phonation time, and other acoustic measurements.Results. A total of 106 street children (study group [SG]) and 90 regular children (control group [CG]) were evaluated. the SG group demonstrated higher oral and nasal resonance, reduced loudness, a lower pitch, and a slower speech rate than the CG. the maximum phonation time, fundamental frequency, and upper harmonics were higher in the SG than the CG. Jitter and shimmer were higher in the CG than the SG.Conclusion. Using perceptual-auditory and acoustic analyses, we determined that there were differences in voice performance between the two groups, with street children having better quality perceptual and acoustic vocal parameters than regular children. We believe that this is due to the procedures and activities performed by the Child Labor Elimination Program (PETI), which helps children to cope with their living conditions.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
OBJETIVO: Correlacionar a taxa de elocução com as rupturas da fluência em pessoas com taquifemia e comparar com pessoas sem taquifemia. MÉTODOS: Participaram dessa investigação 14 indivíduos na faixa etária de 8 a 40 anos e 11 meses de idade, de ambos os gêneros, divididos em dois grupos pareados por idade e gênero. O GI foi composto por sete pessoas com taquifemia e o GII por sete pessoas sem taquifemia. Um protocolo de avaliação da fluência da fala foi utilizado para obter e analisar a amostra de fala, que considera a frequência das disfluências e a taxa de elocução. RESULTADOS: Os dados indicaram que quanto maiores os fluxos de sílabas e de palavras por minuto, maior o número de rupturas na fala, tanto nas pessoas com taquifemia como nas pessoas sem taquifemia. Quanto à comparação entre os grupos, houve correlação tanto para sílabas por minuto como para palavras por minuto apenas no grupo de pessoas sem taquifemia. CONCLUSÃO: O grupo de taquifêmicos apresentou aumento na taxa de elocução e disfluências comuns excessivas. Nos dois grupos analisados ocorreu uma tendência em se obter maiores valores de disfluências comuns à medida que a taxa de elocução aumentava. Porém, na análise comparativa entre o grupo de pessoas com e sem taquifemia, a correlação foi significativa apenas no grupo de pessoas sem taquifemia.
Resumo:
TEMA: o padrão de fala fluente atribuído aos indivíduos com a síndrome de Williams-Beuren sustenta-se pela efetividade da alça fonológica. Alguns estudos citaram a ocorrência de disfluências decorrentes de prejuízos léxico-semânticos, entretanto, a quebra de fluência não foi bem especificada quanto ao tipo e freqüência de ocorrência. OBJETIVO: obter o perfil da fluência da fala de indivíduos com a SWB e comparar com um grupo controle pareado por gênero e idade mental semelhante. MÉTODO: foram avaliados 12 sujeitos com síndrome de Williams-Beuren a com idade cronológica entre 6,6 a 23,6 e idade mental de 4,8 a 14,3 anos que foram comparados a outros 12 sujeitos de idade mental semelhante com ausência de dificuldades de linguagem/aprendizagem. Para avaliação da fluência foi utilizado o Teste de Linguagem Infantil - ABFW, na área de fluência, que possibilitou classificar, quantificar e comparar os dois grupos quanto às tipologias e freqüência de rupturas e velocidade de fala. RESULTADOS: o grupo com a síndrome de Williams-Beuren (SWB) apresentou maior porcentagem de descontinuidade de fala e freqüência aumentada para disfluências comuns do tipo hesitação e repetição de palavras quando comparados aos indivíduos com idade mental semelhante e com desenvolvimento típico de fala e linguagem. CONCLUSÃO: O perfil da fluência da fala apresentado pelos indivíduos com a SWB neste estudo mostrou a presença de disfluências que podem ser decorrentes de prejuízo no processamento léxico-semântico e sintático da informação verbal; ressaltando-se, pois a necessidade de investigações mais sistemáticas sobre este tema.
Resumo:
TEMA: fluência na taquifemia. OBJETIVO: caracterizar e comparar a fluência de indivíduos com taquifemia com indivíduos fluentes. MÉTODOS: participaram dessa investigação 14 indivíduos na faixa etária de 8.0 a 40.11 anos de idade, de ambos os gêneros divididos em dois grupos, pareados por idade e gênero. GI foi composto por 7 indivíduos com taquifemia e GII por 7 indivíduos controles. Um protocolo de avaliação da fluência da fala foi utilizado para obter e analisar a amostra de fala, que considera a tipologia, a freqüência das disfluências e a velocidade de fala. RESULTADOS: os dados indicaram que os grupos se diferenciaram em relação às disfluências comuns e gagas, número de sílabas e de palavras por minuto. CONCLUSÃO: o perfil da fluência de indivíduos com taquifemia é muito distinto do perfil de falantes fluentes.