90 resultados para Vocal fold nodule signals
em Queensland University of Technology - ePrints Archive
Resumo:
Nodule is 19'54" musical work for two electronic music performers, two laptop computers and a custom built, sensor-based microphone controller - the e-Mic (Extended Mic-stand Interface Controller). This interface was developed by one of the co-authors, Donna Hewitt. The e-Mic allows a vocal performer to manipulate their voice in real time by capturing physical gestures via an array of sensors - pressure, distance, tilt – in addition to ribbon controllers and an X-Y joystick microphone mount. Performance data are then sent to a computer, running audio-processing software, which is used to transform the audio signal from the microphone in real time. The work seeks to explore the liminal space between the electro-acoustic music tradition and more recent developments in the electronic dance music tradition. It does so on both a performative (gestural) and compositional (sonic) level. Visually, the performance consists of a singer and a laptop performer, hybridising the gestural context of these traditions. On a sonic level, the work explores hybridity at deeper levels of the musical structure than simple bricolage or collage approaches. Hybridity is explored at the level of the sonic gesture (source material), in production (audio processing gestures), in performance gesture, and in approaches to the use of the frequency spectrum, pulse and meter. The work was designed to be performed in a range of contexts from concert halls, to clubs, to rock festivals, across a range of staging and production platforms. As a consequence, the work has been tested in a range of audience contexts, and has allowed the transportation of compositional and performance practices across traditional audience demographic boundaries.
Resumo:
This thesis presents an original approach to parametric speech coding at rates below 1 kbitsjsec, primarily for speech storage applications. Essential processes considered in this research encompass efficient characterization of evolutionary configuration of vocal tract to follow phonemic features with high fidelity, representation of speech excitation using minimal parameters with minor degradation in naturalness of synthesized speech, and finally, quantization of resulting parameters at the nominated rates. For encoding speech spectral features, a new method relying on Temporal Decomposition (TD) is developed which efficiently compresses spectral information through interpolation between most steady points over time trajectories of spectral parameters using a new basis function. The compression ratio provided by the method is independent of the updating rate of the feature vectors, hence allows high resolution in tracking significant temporal variations of speech formants with no effect on the spectral data rate. Accordingly, regardless of the quantization technique employed, the method yields a high compression ratio without sacrificing speech intelligibility. Several new techniques for improving performance of the interpolation of spectral parameters through phonetically-based analysis are proposed and implemented in this research, comprising event approximated TD, near-optimal shaping event approximating functions, efficient speech parametrization for TD on the basis of an extensive investigation originally reported in this thesis, and a hierarchical error minimization algorithm for decomposition of feature parameters which significantly reduces the complexity of the interpolation process. Speech excitation in this work is characterized based on a novel Multi-Band Excitation paradigm which accurately determines the harmonic structure in the LPC (linear predictive coding) residual spectra, within individual bands, using the concept 11 of Instantaneous Frequency (IF) estimation in frequency domain. The model yields aneffective two-band approximation to excitation and computes pitch and voicing with high accuracy as well. New methods for interpolative coding of pitch and gain contours are also developed in this thesis. For pitch, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments, TD is employed to interpolate the pitch contour between critical points introduced by event centroids. This compresses pitch contour in the ratio of about 1/10 with negligible error. To approximate gain contour, a set of uniformly-distributed Gaussian event-like functions is used which reduces the amount of gain information to about 1/6 with acceptable accuracy. The thesis also addresses a new quantization method applied to spectral features on the basis of statistical properties and spectral sensitivity of spectral parameters extracted from TD-based analysis. The experimental results show that good quality speech, comparable to that of conventional coders at rates over 2 kbits/sec, can be achieved at rates 650-990 bits/sec.
Resumo:
The unique alpine-living kea parrot Nestor notabilis has been the focus of numerous cognitive studies, but its communication system has so far been largely neglected. We examined 2,884 calls recorded in New Zealand’s Southern Alps. Based on audio and visual spectrographic differences, these calls were categorised into seven distinct call types: the non-oscillating ‘screech’ contact call and ‘mew’; and the oscillating ‘trill’, ‘chatter’, ‘warble’ and ‘whistle’; and a hybrid ‘screech-trill’. Most of these calls contained aspects that were individually unique, in addition to potentially encoding for an individual’s sex and age. Additionally, for each recording, the sender’s previous and next calls were noted, as well as any response given by conspecifics. We found that the previous and next calls made by the sender were most often of the same type, and that the next most likely preceding and/or following call type was the screech call, a contact call which sounds like the ‘kee-ah’ from which the bird’s name derives. As a social bird capable of covering large distances over visually obstructive terrain, long distance contact calls may be of considerable importance for social cohesion. Contact calls allow kea to locate conspecifics and congregate in temporary groups for social activities. The most likely response to any given call was a screech, usually followed by the same type of call as the initial call made by the sender, although responses differed depending on the age of the caller. The exception was the warble, the kea’s play call, to which the most likely response was another warble. Being the most common call type, as well as the default response to another call, it appears that the ‘contagious’ screech contact call plays a central role in kea vocal communication and social cohesion