10 resultados para Sounds(waterways)
em Cambridge University Engineering Department Publications Database
Resumo:
Natural sounds are structured on many time-scales. A typical segment of speech, for example, contains features that span four orders of magnitude: Sentences ($\sim1$s); phonemes ($\sim10$−$1$ s); glottal pulses ($\sim 10$−$2$s); and formants ($\sim 10$−$3$s). The auditory system uses information from each of these time-scales to solve complicated tasks such as auditory scene analysis [1]. One route toward understanding how auditory processing accomplishes this analysis is to build neuroscience-inspired algorithms which solve similar tasks and to compare the properties of these algorithms with properties of auditory processing. There is however a discord: Current machine-audition algorithms largely concentrate on the shorter time-scale structures in sounds, and the longer structures are ignored. The reason for this is two-fold. Firstly, it is a difficult technical problem to construct an algorithm that utilises both sorts of information. Secondly, it is computationally demanding to simultaneously process data both at high resolution (to extract short temporal information) and for long duration (to extract long temporal information). The contribution of this work is to develop a new statistical model for natural sounds that captures structure across a wide range of time-scales, and to provide efficient learning and inference algorithms. We demonstrate the success of this approach on a missing data task.
Resumo:
This study is the first step in the psychoacoustic exploration of perceptual differences between the sounds of different violins. A method was used which enabled the same performance to be replayed on different "virtual violins," so that the relationships between acoustical characteristics of violins and perceived qualities could be explored. Recordings of real performances were made using a bridge-mounted force transducer, giving an accurate representation of the signal from the violin string. These were then played through filters corresponding to the admittance curves of different violins. Initially, limits of listener performance in detecting changes in acoustical characteristics were characterized. These consisted of shifts in frequency or increases in amplitude of single modes or frequency bands that have been proposed previously to be significant in the perception of violin sound quality. Thresholds were significantly lower for musically trained than for nontrained subjects but were not significantly affected by the violin used as a baseline. Thresholds for the musicians typically ranged from 3 to 6 dB for amplitude changes and 1.5%-20% for frequency changes. interpretation of the results using excitation patterns showed that thresholds for the best subjects were quite well predicted by a multichannel model based on optimal processing. (c) 2007 Acoustical Society of America.
Resumo:
Pronunciation is an important part of speech acquisition, but little attention has been given to the mechanism or mechanisms by which it develops. Speech sound qualities, for example, have just been assumed to develop by simple imitation. In most accounts this is then assumed to be by acoustic matching, with the infant comparing his output to that of his caregiver. There are theoretical and empirical problems with both of these assumptions, and we present a computational model- Elija-that does not learn to pronounce speech sounds this way. Elija starts by exploring the sound making capabilities of his vocal apparatus. Then he uses the natural responses he gets from a caregiver to learn equivalence relations between his vocal actions and his caregiver's speech. We show that Elija progresses from a babbling stage to learning the names of objects. This demonstrates the viability of a non-imitative mechanism in learning to pronounce.
Resumo:
A recent study has found that toddlers do not compensate for an artificial alteration in a vowel they hear themselves producing. This raises questions about how young children learn speech sounds. © 2012 Elsevier Ltd.
Resumo:
Synthesised acoustic guitar sounds based on a detailed physical model are used to provide input for psychoacoustical testing. Thresholds of perception are found for changes in the main parameters of the model. Using a three-alternative forced-choice procedure, just-noticeable differences are presented for changes in frequency and damping of the modes of the guitar body, and also for changes in the tension, bending stiffness and damping parameters of the strings. These are compared with measured data on the range of variation of these parameters in a selection of guitars. © S. Hirzel Verlag © EAA.
Resumo:
The physical meaning and methods of determining loudness were reviewed Loudness is a psychoacoustic metric which closely corresponds to the perceived intensity of a sound stimulus. It can be determined by graphical procedures, numerical methods, or by commercial software. These methods typically require the consideration of the 1/3 octave band spectrum of the sound of interest. The sounds considered in this paper are a 1 kHz tone and pink noise. The loudness of these sounds was calculated in eight ways using different combinations of input data and calculation methods. All the methods considered are based on Zwicker loudness. It was determined that, of the combinations considered, only the commercial software dBSonic and the loudness calculation procedure detailed in DIN 45631 using 1/3 octave band levels filtered using ANSI S1.11-1986 gave the correct values of loudness for a 1 kHz tone. Comparing the results between the sources also demonstrated the difference between sound pressure level and loudness. It was apparent that the calculation and filtering methods must be considered together, as a given calculation will produce different results for different 1/3 octave band input. In the literature reviewed, no reference provided a guide to the selection of the type of filtering that should be used in conjunction with the loudness computation method.
Resumo:
Auditory scene analysis is extremely challenging. One approach, perhaps that adopted by the brain, is to shape useful representations of sounds on prior knowledge about their statistical structure. For example, sounds with harmonic sections are common and so time-frequency representations are efficient. Most current representations concentrate on the shorter components. Here, we propose representations for structures on longer time-scales, like the phonemes and sentences of speech. We decompose a sound into a product of processes, each with its own characteristic time-scale. This demodulation cascade relates to classical amplitude demodulation, but traditional algorithms fail to realise the representation fully. A new approach, probabilistic amplitude demodulation, is shown to out-perform the established methods, and to easily extend to representation of a full demodulation cascade.
Resumo:
Human listeners can identify vowels regardless of speaker size, although the sound waves for an adult and a child speaking the ’same’ vowel would differ enormously. The differences are mainly due to the differences in vocal tract length (VTL) and glottal pulse rate (GPR) which are both related to body size. Automatic speech recognition machines are notoriously bad at understanding children if they have been trained on the speech of an adult. In this paper, we propose that the auditory system adapts its analysis of speech sounds, dynamically and automatically to the GPR and VTL of the speaker on a syllable-to-syllable basis. We illustrate how this rapid adaptation might be performed with the aid of a computational version of the auditory image model, and we propose that an auditory preprocessor of this form would improve the robustness of speech recognisers.