11 resultados para F0
em Aston University Research Archive
Resumo:
This study explored the role of formant transitions and F0-contour continuity in binding together speech sounds into a coherent stream. Listening to a repeating recorded word produces verbal transformations to different forms; stream segregation contributes to this effect and so it can be used to measure changes in perceptual coherence. In experiment 1, monosyllables with strong formant transitions between the initial consonant and following vowel were monotonized; each monosyllable was paired with a weak-transitions counterpart. Further stimuli were derived by replacing the consonant-vowel transitions with samples from adjacent steady portions. Each stimulus was concatenated into a 3-min-long sequence. Listeners only reported more forms in the transitions-removed condition for strong-transitions words, for which formant-frequency discontinuities were substantial. In experiment 2, the F0 contour of all-voiced monosyllables was shaped to follow a rising or falling pattern, spanning one octave. Consecutive tokens either had the same contour, giving an abrupt F0 change between each token, or alternated, giving a continuous contour. Discontinuous sequences caused more transformations and forms, and shorter times to the first transformation. Overall, these findings support the notion that continuity cues provided by formant transitions and the F0 contour play an important role in maintaining the perceptual coherence of speech.
Resumo:
Mistuning a harmonic produces an exaggerated change in its pitch. This occurs because the component becomes inconsistent with the regular pattern that causes the other harmonics (constituting the spectral frame) to integrate perceptually. These pitch shifts were measured when the fundamental (F0) component of a complex tone (nominal F0 frequency = 200 Hz) was mistuned by +8% and -8%. The pitch-shift gradient was defined as the difference between these values and its magnitude was used as a measure of frame integration. An independent and random perturbation (spectral jitter) was applied simultaneously to most or all of the frame components. The gradient magnitude declined gradually as the degree of jitter increased from 0% to ±40% of F0. The component adjacent to the mistuned target made the largest contribution to the gradient, but more distant components also contributed. The stimuli were passed through an auditory model, and the exponential height of the F0-period peak in the averaged summary autocorrelation function correlated well with the gradient magnitude. The fit improved when the weighting on more distant channels was attenuated by a factor of three per octave. The results are consistent with a grouping mechanism that computes a weighted average of periodicity strength across several components. © 2006 Elsevier B.V. All rights reserved.
Resumo:
In an isolated syllable, a formant will tend to be segregated perceptually if its fundamental frequency (F0) differs from that of the other formants. This study explored whether similar results are found for sentences, and specifically whether differences in F0 (?F0) also influence across-formant grouping in circumstances where the exclusion or inclusion of the manipulated formant critically determines speech intelligibility. Three-formant (F1 + F2 + F3) analogues of almost continuously voiced natural sentences were synthesized using a monotonous glottal source (F0 = 150 Hz). Perceptual organization was probed by presenting stimuli dichotically (F1 + F2C + F3; F2), where F2C is a competitor for F2 that listeners must resist to optimize recognition. Competitors were created using time-reversed frequency and amplitude contours of F2, and F0 was manipulated (?F0 = ±8, ±2, or 0 semitones relative to the other formants). Adding F2C typically reduced intelligibility, and this reduction was greatest when ?F0 = 0. There was an additional effect of absolute F0 for F2C, such that competitor efficacy was greater for higher F0s. However, competitor efficacy was not due to energetic masking of F3 by F2C. The results are consistent with the proposal that a grouping “primitive” based on common F0 influences the fusion and segregation of concurrent formants in sentence perception.
Resumo:
Keyword identification in one of two simultaneous sentences is improved when the sentences differ in F0, particularly when they are almost continuously voiced. Sentences of this kind were recorded, monotonised using PSOLA, and re-synthesised to give a range of harmonic ?F0s (0, 1, 3, and 10 semitones). They were additionally re-synthesised by LPC with the LPC residual frequency shifted by 25% of F0, to give excitation with inharmonic but regularly spaced components. Perceptual identification of frequency-shifted sentences showed a similar large improvement with nominal ?F0 as seen for harmonic sentences, although overall performance was about 10% poorer. We compared performance with that of two autocorrelation-based computational models comprising four stages: (i) peripheral frequency selectivity and half-wave rectification; (ii) within-channel periodicity extraction; (iii) identification of the two major peaks in the summary autocorrelation function (SACF); (iv) a template-based approach to speech recognition using dynamic time warping. One model sampled the correlogram at the target-F0 period and performed spectral matching; the other deselected channels dominated by the interferer and performed matching on the short-lag portion of the residual SACF. Both models reproduced the monotonic increase observed in human performance with increasing ?F0 for the harmonic stimuli, but not for the frequency-shifted stimuli. A revised version of the spectral-matching model, which groups patterns of periodicity that lie on a curve in the frequency-delay plane, showed a closer match to the perceptual data for frequency-shifted sentences. The results extend the range of phenomena originally attributed to harmonic processing to grouping by common spectral pattern.
Resumo:
Noise-vocoded (NV) speech is often regarded as conveying phonetic information primarily through temporal-envelope cues rather than spectral cues. However, listeners may infer the formant frequencies in the vocal-tract output—a key source of phonetic detail—from across-band differences in amplitude when speech is processed through a small number of channels. The potential utility of this spectral information was assessed for NV speech created by filtering sentences into six frequency bands, and using the amplitude envelope of each band (=30 Hz) to modulate a matched noise-band carrier (N). Bands were paired, corresponding to F1 (˜N1 + N2), F2 (˜N3 + N4) and the higher formants (F3' ˜ N5 + N6), such that the frequency contour of each formant was implied by variations in relative amplitude between bands within the corresponding pair. Three-formant analogues (F0 = 150 Hz) of the NV stimuli were synthesized using frame-by-frame reconstruction of the frequency and amplitude of each formant. These analogues were less intelligible than the NV stimuli or analogues created using contours extracted from spectrograms of the original sentences, but more intelligible than when the frequency contours were replaced with constant (mean) values. Across-band comparisons of amplitude envelopes in NV speech can provide phonetically important information about the frequency contours of the underlying formants.
Resumo:
A nonlinear dynamic model of microbial growth is established based on the theories of the diffusion response of thermodynamics and the chemotactic response of biology. Except for the two traditional variables, i.e. the density of bacteria and the concentration of attractant, the pH value, a crucial influencing factor to the microbial growth, is also considered in this model. The pH effect on the microbial growth is taken as a Gaussian function G0e-(f- fc)2/G1, where G0, G1 and fc are constants, f represents the pH value and fc represents the critical pH value that best fits for microbial growth. To study the effects of the reproduction rate of the bacteria and the pH value on the stability of the system, three parameters a, G0 and G1 are studied in detail, where a denotes the reproduction rate of the bacteria, G0 denotes the impacting intensity of the pH value to microbial growth and G1 denotes the bacterial adaptability to the pH value. When the effect of the pH value of the solution which microorganisms live in is ignored in the governing equations of the model, the microbial system is more stable with larger a. When the effect of the bacterial chemotaxis is ignored, the microbial system is more stable with the larger G1 and more unstable with the larger G0 for f0 > fc. However, the stability of the microbial system is almost unaffected by the variation G0 and G1 and it is always stable for f0 < fc under the assumed conditions in this paper. In the whole system model, it is more unstable with larger G1 and more stable with larger G0 for f0 < fc. The system is more stable with larger G1 and more unstable with larger G0 for f0 > fc. However, the system is more unstable with larger a for f0 < fc and the stability of the system is almost unaffected by a for f0 > fc. The results obtained in this study provide a biophysical insight into the understanding of the growth and stability behavior of microorganisms.
Resumo:
Six experiments investigated the influence of several grouping cues within the framework of the Verbal Transformation Effect (VTE, Experiments 1 to 4) and Phonemic Transformation Effect (PTE, Experiments 5 and 6), where listening to a repeated word (VTE) or sequence of vowels (PTE) produces verbal transformations (VTs). In Experiment 1, the influence of F0 frequency and lateralization cues (ITDs) was investigated in terms of the pattern of VTs. As the lateralization difference increased between two repeating sequences, the number of forms was significantly reduced with the fewest forms reported in the dichotic condition. Experiment 2 explored whether or not propensity to report more VTs on high pitch was due to the task demands of monitoring two sequences at once. The number of VTs reported was higher when listeners were asked to attend to one sequence only, suggesting smaller attentional constraints on the task requirements. In Experiment 3, consonant-vowel transitions were edited out from two sets of six stimuli words with ‘strong’ and ‘weak’ formant transitions, respectively. Listeners reported more forms in the spliced-out than in the unedited case for the strong-transition words, but not for those with weak transitions. A similar trend was observed for the F0 contour manipulation used in Experiment 4 where listeners reported more VTs and forms for words following a discontinuous F0 contour. In Experiments 5 and 6, the role of F0 frequency and ITD cues was investigated further using a related phenomenon – the PTE. Although these manipulations had relatively little effect on the number of VTs and forms reported, they did influence the particular forms heard. In summary, the current experiments confirmed that it is possible to successfully investigate auditory grouping cues within the VTE framework and that, in agreement with recent studies, the results can be attributed to the perceptual re-grouping of speech sounds.
Resumo:
An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics-for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition.
Resumo:
Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This idea was explored using a method that ensures interference occurs only through informational masking. Three-formant analogues of sentences were synthesized using a monotonous periodic source (F0 = 140 Hz). Target formants were presented monaurally; the target ear was assigned randomly on each trial. A competitor for F2 (F2C) was presented contralaterally; listeners must reject F2C to optimize recognition. In experiment 1, F2Cs with various frequency and amplitude contours were used. F2Cs with time-varying frequency contours were effective competitors; constant-frequency F2Cs had far less impact. Amplitude contour also influenced competitor impact; this effect was additive. In experiment 2, F2Cs were created by inverting the F2 frequency contour about its geometric mean and varying its depth of variation over a range from constant to twice the original (0–200%). The impact on intelligibility was least for constant F2Cs and increased up to ~100% depth, but little thereafter. The effect of an extraneous formant depends primarily on its frequency contour; interference increases as the depth of variation is increased until the range exceeds that typical for F2 in natural speech.
Resumo:
Objective: The aim of this study was to design a novel experimental approach to investigate the morphological characteristics of auditory cortical responses elicited by rapidly changing synthesized speech sounds. Methods: Six sound-evoked magnetoencephalographic (MEG) responses were measured to a synthesized train of speech sounds using the vowels /e/ and /u/ in 17 normal hearing young adults. Responses were measured to: (i) the onset of the speech train, (ii) an F0 increment; (iii) an F0 decrement; (iv) an F2 decrement; (v) an F2 increment; and (vi) the offset of the speech train using short (jittered around 135. ms) and long (1500. ms) stimulus onset asynchronies (SOAs). The least squares (LS) deconvolution technique was used to disentangle the overlapping MEG responses in the short SOA condition only. Results: Comparison between the morphology of the recovered cortical responses in the short and long SOAs conditions showed high similarity, suggesting that the LS deconvolution technique was successful in disentangling the MEG waveforms. Waveform latencies and amplitudes were different for the two SOAs conditions and were influenced by the spectro-temporal properties of the sound sequence. The magnetic acoustic change complex (mACC) for the short SOA condition showed significantly lower amplitudes and shorter latencies compared to the long SOA condition. The F0 transition showed a larger reduction in amplitude from long to short SOA compared to the F2 transition. Lateralization of the cortical responses were observed under some stimulus conditions and appeared to be associated with the spectro-temporal properties of the acoustic stimulus. Conclusions: The LS deconvolution technique provides a new tool to study the properties of the auditory cortical response to rapidly changing sound stimuli. The presence of the cortical auditory evoked responses for rapid transition of synthesized speech stimuli suggests that the temporal code is preserved at the level of the auditory cortex. Further, the reduced amplitudes and shorter latencies might reflect intrinsic properties of the cortical neurons to rapidly presented sounds. Significance: This is the first demonstration of the separation of overlapping cortical responses to rapidly changing speech sounds and offers a potential new biomarker of discrimination of rapid transition of sound.
Resumo:
This study explored the effects on speech intelligibility of across-formant differences in fundamental frequency (ΔF0) and F0 contour. Sentence-length speech analogues were presented dichotically (left=F1+F3; right=F2), either alone or—because competition usually reveals grouping cues most clearly—accompanied in the left ear by a competitor for F2 (F2C) that listeners must reject to optimize recognition. F2C was created by inverting the F2 frequency contour. In experiment 1, all left-ear formants shared the same constant F0 and ΔF0F2 was 0 or ±4 semitones. In experiment 2, all left-ear formants shared the natural F0 contour and that for F2 was natural, constant, exaggerated, or inverted. Adding F2C lowered keyword scores, presumably because of informational masking. The results for experiment 1 were complicated by effects associated with the direction of ΔF0F2; this problem was avoided in experiment 2 because all four F0 contours had the same geometric mean frequency. When the target formants were presented alone, scores were relatively high and did not depend on the F0F2 contour. F2C impact was greater when F2 had a different F0 contour from the other formants. This effect was a direct consequence of the associated ΔF0; the F0F2 contour per se did not influence competitor impact.