872 resultados para Phonetic Perception
Resumo:
The McGurk effect, in which auditory [ba] dubbed onto [go] lip movements is perceived as da or tha, was employed in a real-time task to investigate auditory-visual speech perception in prelingual infants. Experiments 1A and 1B established the validity of real-time dubbing for producing the effect. In Experiment 2, 4(1)/(2)-month-olds were tested in a habituation-test paradigm, in which 2 an auditory-visual stimulus was presented contingent upon visual fixation of a live face. The experimental group was habituated to a McGurk stimulus (auditory [ba] visual [ga]), and the control group to matching auditory-visual [ba]. Each group was then presented with three auditory-only test trials, [ba], [da], and [deltaa] (as in then). Visual-fixation durations in test trials showed that the experimental group treated the emergent percept in the McGurk effect, [da] or [deltaa], as familiar (even though they had not heard these sounds previously) and [ba] as novel. For control group infants [da] and [deltaa] were no more familiar than [ba]. These results are consistent with infants'perception of the McGurk effect, and support the conclusion that prelinguistic infants integrate auditory and visual speech information. (C) 2004 Wiley Periodicals, Inc.
Resumo:
Perceptual constancy effects are observed when differing amounts of reverberation are applied to a context sentence and a test‐word embedded in it. Adding reverberation to members of a “sir”‐“stir” test‐word continuum causes temporal‐envelope distortion, which has the effect of eliciting more sir responses from listeners. If the same amount of reverberation is also applied to the context sentence, the number of sir responses decreases again, indicating an “extrinsic” compensation for the effects of reverberation. Such a mechanism would effect perceptual constancy of phonetic perception when temporal envelopes vary in reverberation. This experiment asks whether such effects precede or follow grouping. Eight auditory‐filter shaped noise‐bands were modulated with the temporal envelopes that arise when speech is played through these filters. The resulting “gestalt” percept is the appropriate speech rather than the sound of noise‐bands, presumably due to across‐channel “grouping.” These sounds were played to listeners in “matched” conditions, where reverberation was present in the same bands in both context and test‐word, and in “mismatched” conditions, where the bands in which reverberation was added differed between context and test‐word. Constancy effects were obtained in matched conditions, but not in mismatched conditions, indicating that this type of constancy in hearing precedes across‐channel grouping.
Resumo:
In this paper we propose a linear time-varying model for diphthong synthesis based on linear interpolation of formant frequencies. We, thence, determine the timbre just-noticeable difference (JND) for diphthong /a I/ (as in ‘buy’) with a constant pitch excitation through perception experiment involving four listeners and explore the phonetic JND of the diphthong. Their JND responses are determined using 1-up-3-down procedure. Using the experimental data, we map the timbre JND and phonetic JND onto a 2-D region of percentage change of formant glides. The timbre and phonetic JND contours for constant pitch show that the phonetic JND region encloses timbre JND region and also varies across listeners. The JND is observed to be more sensitive to ending vowel /I/ than starting vowel /a/ in some listeners and dependent on the direction of perturbation of starting and ending vowels.
Resumo:
Speech can be understood at widely varying production rates. A working memory is described for short-term storage of temporal lists of input items. The working memory is a cooperative-competitive neural network that automatically adjusts its integration rate, or gain, to generate a short-term memory code for a list that is independent of item presentation rate. Such an invariant working memory model is used to simulate data of Repp (1980) concerning the changes of phonetic category boundaries as a function of their presentation rate. Thus the variability of categorical boundaries can be traced to the temporal in variance of the working memory code.
Resumo:
This study examined the discrimination of word-final stop contrasts (/p/-/t/, /p/-/k/, /t/-/k/) in English and Thai by 12 listeners who speak Vietnamese as their first language (L1). Vietnamese shares specific phonetic realization of stops with Thai, i.e., unreleased final stop and differs from English which allows both released and unreleased final stops. These 12 native Vietnamese (NV) listeners’ discrimination accuracy was compared to that of the two listener groups (Australian English (AE), native Thai (NT)) tested in previous studies. The NV group was less accurate than the native group in discriminating both English and Thai stop contrasts. In particular, for the Thai /t/-/k/ contrast, they were significantly less accurate than the AE listeners. The present findings suggest that experience with specific (i.e., unreleased) and native phonetic realization of sounds may be essential in accurate discrimination of final stop contrasts. The effect of L1 dialect on cross-language speech perception is discussed.
Resumo:
Many languages exploit suprasegmental devices in signaling word meaning. Tone languages exploit fundamental frequency whereas quantity languages rely on segmental durations to distinguish otherwise similar words. Traditionally, duration and tone have been taken as mutually exclusive. However, some evidence suggests that, in addition to durational cues, phonological quantity is associated with and co-signaled by changes in fundamental frequency in quantity languages such as Finnish, Estonian, and Serbo-Croat. The results from the present experiment show that the structure of disyllabic word stems in Finnish are indeed signaled tonally and that the phonological length of the stressed syllable is further tonally distinguished within the disyllabic sequence. The results further indicate that the observed association of tone and duration in perception is systematically exploited in speech production in Finnish.
Resumo:
To investigate the process underlying audiovisual speech perception, the McGurk illusion was examined across a range of phonetic contexts. Two major changes were found. First, the frequency of illusory /g/ fusion percepts increased relative to the frequency of illusory /d/ fusion percepts as vowel context was shifted from /i/ to /a/ to /u/. This trend could not be explained by biases present in perception of the unimodal visual stimuli. However, the change found in the McGurk fusion effect across vowel environments did correspond systematically with changes in second format frequency patterns across contexts. Second, the order of consonants in illusory combination percepts was found to depend on syllable type. This may be due to differences occuring across syllable contexts in the timecourses of inputs from the two modalities as delaying the auditory track of a vowel-consonant stimulus resulted in a change in the order of consonants perceived. Taken together, these results suggest that the speech perception system either fuses audiovisual inputs into a visually compatible percept with a similar second formant pattern to that of the acoustic stimulus or interleaves the information from different modalities, at a phonemic or subphonemic level, based on their relative arrival times.
Resumo:
A speech message played several metres from the listener in a room is usually heard to have much the same phonetic content as it does when played nearby, even though the different amounts of reflected sound make the temporal envelopes of these signals very different. To study this ‘constancy’ effect, listeners heard speech messages and speech-like sounds comprising 8 auditory-filter shaped noise-bands that had temporal envelopes corresponding to those in these filters when the speech message is played. The ‘contexts’ were “next you’ll get _to click on”, into which a “sir” or “stir” test word was inserted. These test words were from an 11-step continuum, formed by amplitude modulation. Listeners identified the test words appropriately, even in the 8-band conditions where the speech had a ‘robotic’ quality. Constancy was assessed by comparing the influence of room reflections on the test word across conditions where the context had either the same level of room reflections (i.e. from the same, far distance), or where it had a much lower level (i.e. from nearby). Constancy effects were obtained with both the natural- and the 8-band speech. Results are considered in terms of the degree of ‘matching’ between the context’s and test-word’s bands.
Resumo:
Foreign accent can be everything from hardly detectable to rendering the second language speech unintelligible. It is assumed that certain aspects of a specific target language contribute more to making the foreign accented speech intelligible and listener friendly, than others. The present thesis examines a teaching strategy for Swedish pronunciation in second language education. The teaching strategy “Basic prosody” or BP, gives priority to temporal aspects of Swedish prosody, which means the temporal phonological contrasts word stress and quantity, as well as the durational realizations of these contrasts. BP does not prescribe any specific tonal realizations. This standpoint is based on the great regional variety in realization and distribution of Swedish word accents. The teaching strategy consists virtually of three directives: · Stress the proper word in the sentence. · Stress proper syllables in stressed words and make them longer. · Lengthen the proper segment – vowel or subsequent consonant – in the stressed syllable. These directives reflect the view that all phonological length is stress-induced, and that vowel length and consonant length are equally important as learning goals. BP is examined in the light of existing findings in the field of second language pronunciation and with respect to the phonetic correlates of Swedish stress and quantity. Five studies examine the relation between segment durations and the categorization made by native Swedish listeners. The results indicate that the postvocalic consonant duration contributes to quantity categorization as well as giving the proper duration to stressed syllables. Furthermore, native Swedish speakers are shown to apply the complementary /V: C/ - /VC:/ pattern also when speaking English and German, by lengthening postvocalic consonants. The correctness of the priority is not directly addressed but important aspects of BP are supported by earlier findings as well as the results from the present studies.
Resumo:
Although developmental increases in the size of the position effect within a mispronunciation detection task have been interpreted as consistent with a view of the lexical restructuring process as protracted, the position effect itself might not be reliable. The current research examined the effects of position and clarity of acoustic-phonetic information on sensitivity to mispronounced onsets in 5- and 6-year-olds and adults. Both children and adults showed a position effect only when mispronunciations also differed in the amount of relevant acoustic-phonetic information. Adults' sensitivity to mispronounced second-syllable onsets also reflected the availability of acoustic-phonetic information. The implications of these findings are discussed in relation to the lexical restructuring hypothesis. (c) 2006 Elsevier Inc. All rights reserved.
Resumo:
A great deal of scholarly research has addressed the issue of dialect mapping in the United States. These studies, usually based on phonetic or lexical items, aim to present an overall picture of the dialect landscape. But what is often missing in these types of projects is an attention to the borders of a dialect region and to what kinds of identity alignments can be found in such areas. This lack of attention to regional and dialect border identities is surprising, given the salience of such borders for many Americans. This salience is also ignored among dialectologists, as nonlinguists‟ perceptions and attitudes have been generally assumed to be secondary to the analysis of “real” data, such as the phonetic and lexical variables used in traditional dialectology. Louisville, Kentucky is considered as a case study for examining how dialect and regional borders in the United States impact speakers‟ linguistic acts of identity, especially the production and perception of such identities. According to Labov, Ash, and Boberg (2006), Louisville is one of the northernmost cities to be classified as part of the South. Its location on the Ohio River, on the political and geographic border between Kentucky and Indiana, places Louisville on the isogloss between Southern and Midland dialects. Through an examination of language attitude surveys, mental maps, focus group interviews, and production data, I show that identity alignments in borderlands are neither simple nor straightforward. Identity at the border is fluid, complex, and dynamic; speakers constantly negotiate and contest their identities. The analysis shows the ways in which Louisvillians shift between Southern and non-Southern identities, in the active and agentive expression of their amplified awareness of belonging brought about by their position on the border.