936 resultados para Speech recogntion
Resumo:
Models of normal word production are well specified about the effects of frequency of linguistic stimuli on lexical access, but are less clear regarding the same effects on later stages of word production, particularly word articulation. In aphasia, this lack of specificity of down-stream frequency effects is even more noticeable because there is relatively limited amount of data on the time course of frequency effects for this population. This study begins to fill this gap by comparing the effects of variation of word frequency (lexical, whole word) and bigram frequency (sub-lexical, within word) on word production abilities in ten normal speakers and eight mild–moderate individuals with aphasia. In an immediate repetition paradigm, participants repeated single monosyllabic words in which word frequency (high or low) was crossed with bigram frequency (high or low). Indices for mapping the time course for these effects included reaction time (RT) for linguistic processing and motor preparation, and word duration (WD) for speech motor performance (word articulation time). The results indicated that individuals with aphasia had significantly longer RT and WD compared to normal speakers. RT showed a significant main effect only for word frequency (i.e., high-frequency words had shorter RT). WD showed significant main effects of word and bigram frequency; however, contrary to our expectations, high-frequency items had longer WD. Further investigation of WD revealed that independent of the influence of word and bigram frequency, vowel type (tense or lax) had the expected effect on WD. Moreover, individuals with aphasia differed from control speakers in their ability to implement tense vowel duration, even though they could produce an appropriate distinction between tense and lax vowels. The results highlight the importance of using temporal measures to identify subtle deficits in linguistic and speech motor processing in aphasia, the crucial role of phonetic characteristics of stimuli set in studying speech production and the need for the language production models to account more explicitly for word articulation.
Resumo:
Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if so, this was primarily restricted to the study of single articulators. If AOS reflects a basic neuromotor dysfunction, this should somehow be evident in the production of both dysfluent and perceptually fluent speech. The current study compared motor control strategies for the production of perceptually fluent speech between a young woman with apraxia of speech (AOS) and Broca’s aphasia and a group of age-matched control speakers using concepts and tools from articulation-based theories. In addition, to examine the potential role of specific movement variables on gestural coordination, a second part of this study involved a comparison of fluent and dysfluent speech samples from the speaker with AOS. Movement data from the lips, jaw and tongue were acquired using the AG-100 EMMA system during the reiterated production of multisyllabic nonwords. The findings indicated that although in general kinematic parameters of fluent speech were similar in the subject with AOS and Broca’s aphasia to those of the age-matched controls, speech task-related differences were observed in upper lip movements and lip coordination. The comparison between fluent and dysfluent speech characteristics suggested that fluent speech was achieved through the use of specific motor control strategies, highlighting the potential association between the stability of coordinative patterns and movement range, as described in Coordination Dynamics theory.
Resumo:
To investigate the neural network of overt speech production, eventrelated fMRI was performed in 9 young healthy adult volunteers. A clustered image acquisition technique was chosen to minimize speechrelated movement artifacts. Functional images were acquired during the production of oral movements and of speech of increasing complexity (isolated vowel as well as monosyllabic and trisyllabic utterances). This imaging technique and behavioral task enabled depiction of the articulo-phonologic network of speech production from the supplementary motor area at the cranial end to the red nucleus at the caudal end. Speaking a single vowel and performing simple oral movements involved very similar activation of the corticaland subcortical motor systems. More complex, polysyllabic utterances were associated with additional activation in the bilateral cerebellum,reflecting increased demand on speech motor control, and additional activation in the bilateral temporal cortex, reflecting the stronger involvement of phonologic processing.
Resumo:
Three experiments measured constancy in speech perception, using natural-speech messages or noise-band vocoder versions of them. The eight vocoder-bands had equally log-spaced center-frequencies and the shapes of corresponding “auditory” filters. Consequently, the bands had the temporal envelopes that arise in these auditory filters when the speech is played. The “sir” or “stir” test-words were distinguished by degrees of amplitude modulation, and played in the context; “next you’ll get _ to click on.” Listeners identified test-words appropriately, even in the vocoder conditions where the speech had a “noise-like” quality. Constancy was assessed by comparing the identification of test-words with low or high levels of room reflections across conditions where the context had either a low or a high level of reflections. Constancy was obtained with both the natural and the vocoded speech, indicating that the effect arises through temporal-envelope processing. Two further experiments assessed perceptual weighting of the different bands, both in the test word and in the context. The resulting weighting functions both increase monotonically with frequency, following the spectral characteristics of the test-word’s [s]. It is suggested that these two weighting functions are similar because they both come about through the perceptual grouping of the test-word’s bands.
Resumo:
When speech is in competition with interfering sources in rooms, monaural indicators of intelligibility fail to take account of the listener’s abilities to separate target speech from interfering sounds using the binaural system. In order to incorporate these segregation abilities and their susceptibility to reverberation, Lavandier and Culling [J. Acoust. Soc. Am. 127, 387–399 (2010)] proposed a model which combines effects of better-ear listening and binaural unmasking. A computationally efficient version of this model is evaluated here under more realistic conditions that include head shadow, multiple stationary noise sources, and real-room acoustics. Three experiments are presented in which speech reception thresholds were measured in the presence of one to three interferers using real-room listening over headphones, simulated by convolving anechoic stimuli with binaural room impulse-responses measured with dummy-head transducers in five rooms. Without fitting any parameter of the model, there was close correspondence between measured and predicted differences in threshold across all tested conditions. The model’s components of better-ear listening and binaural unmasking were validated both in isolation and in combination. The computational efficiency of this prediction method allows the generation of complex “intelligibility maps” from room designs. © 2012 Acoustical Society of America