880 resultados para speech signals


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speech is often a multimodal process, presented audiovisually through a talking face. One area of speech perception influenced by visual speech is speech segmentation, or the process of breaking a stream of speech into individual words. Mitchel and Weiss (2013) demonstrated that a talking face contains specific cues to word boundaries and that subjects can correctly segment a speech stream when given a silent video of a speaker. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2013). In Experiment 1, subjects were found to spend the most time watching the eyes and mouth, with a trend suggesting that the mouth was viewed more than the eyes. Although subjects displayed significant learning of word boundaries, performance was not correlated with gaze duration on any individual feature, nor was performance correlated with a behavioral measure of autistic-like traits. However, trends suggested that as autistic-like traits increased, gaze duration of the mouth increased and gaze duration of the eyes decreased, similar to significant trends seen in autistic populations (Boratston & Blakemore, 2007). In Experiment 2, the same video was modified so that a black bar covered the eyes or mouth. Both videos elicited learning of word boundaries that was equivalent to that seen in the first experiment. Again, no correlations were found between segmentation performance and SRS scores in either condition. These results, taken with those in Experiment, suggest that neither the eyes nor mouth are critical to speech segmentation and that perhaps more global head movements indicate word boundaries (see Graf, Cosatto, Strom, & Huang, 2002). Future work will elucidate the contribution of individual features relative to global head movements, as well as extend these results to additional types of speech tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Telephone communication is a challenge for many hearing-impaired individuals. One important technical reason for this difficulty is the restricted frequency range (0.3-3.4 kHz) of conventional landline telephones. Internet telephony (voice over Internet protocol [VoIP]) is transmitted with a larger frequency range (0.1-8 kHz) and therefore includes more frequencies relevant to speech perception. According to a recently published, laboratory-based study, the theoretical advantage of ideal VoIP conditions over conventional telephone quality has translated into improved speech perception by hearing-impaired individuals. However, the speech perception benefits of nonideal VoIP network conditions, which may occur in daily life, have not been explored. VoIP use cannot be recommended to hearing-impaired individuals before its potential under more realistic conditions has been examined.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speech is typically a multimodal phenomenon, yet few studies have focused on the exclusive contributions of visual cues to language acquisition. To address this gap, we investigated whether visual prosodic information can facilitate speech segmentation. Previous research has demonstrated that language learners can use lexical stress and pitch cues to segment speech and that learners can extract this information from talking faces. Thus, we created an artificial speech stream that contained minimal segmentation cues and paired it with two synchronous facial displays in which visual prosody was either informative or uninformative for identifying word boundaries. Across three familiarisation conditions (audio stream alone, facial streams alone, and paired audiovisual), learning occurred only when the facial displays were informative to word boundaries, suggesting that facial cues can help learners solve the early challenges of language acquisition.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent advances in the field of statistical learning have established that learners are able to track regularities of multimodal stimuli, yet it is unknown whether the statistical computations are performed on integrated representations or on separate, unimodal representations. In the present study, we investigated the ability of adults to integrate audio and visual input during statistical learning. We presented learners with a speech stream synchronized with a video of a speaker's face. In the critical condition, the visual (e.g., /gi/) and auditory (e.g., /mi/) signals were occasionally incongruent, which we predicted would produce the McGurk illusion, resulting in the perception of an audiovisual syllable (e.g., /ni/). In this way, we used the McGurk illusion to manipulate the underlying statistical structure of the speech streams, such that perception of these illusory syllables facilitated participants' ability to segment the speech stream. Our results therefore demonstrate that participants can integrate audio and visual input to perceive the McGurk illusion during statistical learning. We interpret our findings as support for modality-interactive accounts of statistical learning.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim was to investigate the effect of different speech tasks, i.e. recitation of prose (PR), alliteration (AR) and hexameter (HR) verses and a control task (mental arithmetic (MA) with voicing of the result on end-tidal CO2 (PETCO2), cerebral hemodynamics and oxygenation. CO2 levels in the blood are known to strongly affect cerebral blood flow. Speech changes breathing pattern and may affect CO2 levels. Measurements were performed on 24 healthy adult volunteers during the performance of the 4 tasks. Tissue oxygen saturation (StO2) and absolute concentrations of oxyhemoglobin ([O2Hb]), deoxyhemoglobin ([HHb]) and total hemoglobin ([tHb]) were measured by functional near-infrared spectroscopy (fNIRS) and PETCO2 by a gas analyzer. Statistical analysis was applied to the difference between baseline before the task, 2 recitation and 5 baseline periods after the task. The 2 brain hemispheres and 4 tasks were tested separately. A significant decrease in PETCO2 was found during all 4 tasks with the smallest decrease during the MA task. During the recitation tasks (PR, AR and HR) a statistically significant (p < 0.05) decrease occurred for StO2 during PR and AR in the right prefrontal cortex (PFC) and during AR and HR in the left PFC. [O2Hb] decreased significantly during PR, AR and HR in both hemispheres. [HHb] increased significantly during the AR task in the right PFC. [tHb] decreased significantly during HR in the right PFC and during PR, AR and HR in the left PFC. During the MA task, StO2 increased and [HHb] decreased significantly during the MA task. We conclude that changes in breathing (hyperventilation) during the tasks led to lower CO2 pressure in the blood (hypocapnia), predominantly responsible for the measured changes in cerebral hemodynamics and oxygenation. In conclusion, our findings demonstrate that PETCO2 should be monitored during functional brain studies investigating speech using neuroimaging modalities, such as fNIRS, fMRI to ensure a correct interpretation of changes in hemodynamics and oxygenation.