11 resultados para microphones
em Queensland University of Technology - ePrints Archive
Resumo:
This paper proposes a clustered approach for blind beamfoming from ad-hoc microphone arrays. In such arrangements, microphone placement is arbitrary and the speaker may be close to one, all or a subset of microphones at a given time. Practical issues with such a configuration mean that some microphones might be better discarded due to poor input signal to noise ratio (SNR) or undesirable spatial aliasing effects from large inter-element spacings when beamforming. Large inter-microphone spacings may also lead to inaccuracies in delay estimation during blind beamforming. In such situations, using a cluster of microphones (ie, a sub-array), closely located both to each other and to the desired speech source, may provide more robust enhancement than the full array. This paper proposes a method for blind clustering of microphones based on the magnitude square coherence function, and evaluates the method on a database recorded using various ad-hoc microphone arrangements.
Resumo:
Microphone arrays have been used in various applications to capture conversations, such as in meetings and teleconferences. In many cases, the microphone and likely source locations are known \emph{a priori}, and calculating beamforming filters is therefore straightforward. In ad-hoc situations, however, when the microphones have not been systematically positioned, this information is not available and beamforming must be achieved blindly. In achieving this, a commonly neglected issue is whether it is optimal to use all of the available microphones, or only an advantageous subset of these. This paper commences by reviewing different approaches to blind beamforming, characterising them by the way they estimate the signal propagation vector and the spatial coherence of noise in the absence of prior knowledge of microphone and speaker locations. Following this, a novel clustered approach to blind beamforming is motivated and developed. Without using any prior geometrical information, microphones are first grouped into localised clusters, which are then ranked according to their relative distance from a speaker. Beamforming is then performed using either the closest microphone cluster, or a weighted combination of clusters. The clustered algorithms are compared to the full set of microphones in experiments on a database recorded on different ad-hoc array geometries. These experiments evaluate the methods in terms of signal enhancement as well as performance on a large vocabulary speech recognition task.
Resumo:
While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the user. For this reason, microphone arrays are popular and have been used in a wide range of applications such as teleconferencing, hearing aids, speaker tracking, and as the front-end to speech recognition systems. With advances in sensor and sensor network technology, there is considerable potential for applications that employ ad-hoc networks of microphone-equipped devices collaboratively as a virtual microphone array. By allowing such devices to be distributed throughout the users’ environment, the microphone positions are no longer constrained to traditional fixed geometrical arrangements. This flexibility in the means of data acquisition allows different audio scenes to be captured to give a complete picture of the working environment. In such ad-hoc deployment of microphone sensors, however, the lack of information about the location of devices and active speakers poses technical challenges for array signal processing algorithms which must be addressed to allow deployment in real-world applications. While not an ad-hoc sensor network, conditions approaching this have in effect been imposed in recent National Institute of Standards and Technology (NIST) ASR evaluations on distant microphone recordings of meetings. The NIST evaluation data comes from multiple sites, each with different and often loosely specified distant microphone configurations. This research investigates how microphone array methods can be applied for ad-hoc microphone arrays. A particular focus is on devising methods that are robust to unknown microphone placements in order to improve the overall speech quality and recognition performance provided by the beamforming algorithms. In ad-hoc situations, microphone positions and likely source locations are not known and beamforming must be achieved blindly. There are two general approaches that can be employed to blindly estimate the steering vector for beamforming. The first is direct estimation without regard to the microphone and source locations. An alternative approach is instead to first determine the unknown microphone positions through array calibration methods and then to use the traditional geometrical formulation for the steering vector. Following these two major approaches investigated in this thesis, a novel clustered approach which includes clustering the microphones and selecting the clusters based on their proximity to the speaker is proposed. Novel experiments are conducted to demonstrate that the proposed method to automatically select clusters of microphones (ie, a subarray), closely located both to each other and to the desired speech source, may in fact provide a more robust speech enhancement and recognition than the full array could.
Resumo:
What is a record producer? There is a degree of mystery and uncertainty about just what goes on behind the studio door. Some producers are seen as Svengali-like figures manipulating artists into mass consumer product. Producers are sometimes seen as mere technicians whose job is simply to set up a few microphones and press the record button. Close examination of the recording process will show how far this is from a complete picture. Artists are special—they come with an inspiration, and a talent, but also with a variety of complications, and in many ways a recording studio can seem the least likely place for creative expression and for an affective performance to happen. The task of the record producer is to engage with these artists and their songs and turn these potentials into form through the technology of the recording studio. The purpose of the exercise is to disseminate this fixed form to an imagined audience—generally in the hope that this audience will prove to be real. Finding an audience is the role of the record company. A record producer must also engage with the commercial expectations of the interests that underwrite a recording. This dissertation considers three fields of interest in the recording process: the performer and the song; the technology of the recording context; and the commercial ambitions of the record company—and positions the record producer as a nexus at the interface of all three. The author reports his structured recollection of five recordings, with three different artists, that all achieved substantial commercial success. The processes are considered from the author’s perspective as the record producer, and from inception of the project to completion of the recorded work. What were the processes of engagement? Do the actions reported conform to the template of nexus? This dissertation proposes that in all recordings the function of producer/nexus is present and necessary—it exists in the interaction of the artistry and the technology. The art of record production is to engage with these artists and the songs they bring and turn these potentials into form.
Resumo:
The Internet presents a constantly evolving frontier for criminology and policing, especially in relation to online predators – paedophiles operating within the Internet for safer access to children, child pornography and networking opportunities with other online predators. The goals of this qualitative study are to undertake behavioural research – identify personality types and archetypes of online predators and compare and contrast them with behavioural profiles and other psychological research on offline paedophiles and sex offenders. It is also an endeavour to gather intelligence on the technological utilisation of online predators and conduct observational research on the social structures of online predator communities. These goals were achieved through the covert monitoring and logging of public activity within four Internet Relay Chat(rooms) (IRC) themed around child sexual abuse and which were located on the Undernet network. Five days of monitoring was conducted on these four chatrooms between Wednesday 1 to Sunday 5 April 2009; this raw data was collated and analysed. The analysis identified four personality types – the gentleman predator, the sadist, the businessman and the pretender – and eight archetypes consisting of the groomers, dealers, negotiators, roleplayers, networkers, chat requestors, posters and travellers. The characteristics and traits of these personality types and archetypes, which were extracted from the literature dealing with offline paedophiles and sex offenders, are detailed and contrasted against the online sexual predators identified within the chatrooms, revealing many similarities and interesting differences particularly with the businessman and pretender personality types. These personality types and archetypes were illustrated by selecting users who displayed the appropriate characteristics and tracking them through the four chatrooms, revealing intelligence data on the use of proxies servers – especially via the Tor software – and other security strategies such as Undernet’s host masking service. Name and age changes, which is used as a potential sexual grooming tactic was also revealed through the use of Analyst’s Notebook software and information on ISP information revealed the likelihood that many online predators were not using any safety mechanism and relying on the anonymity of the Internet. The activities of these online predators were analysed, especially in regards to child sexual grooming and the ‘posting’ of child pornography, which revealed a few of the methods in which online predators utilised new Internet technologies to sexually groom and abuse children – using technologies such as instant messengers, webcams and microphones – as well as store and disseminate illegal materials on image sharing websites and peer-to-peer software such as Gigatribe. Analysis of the social structures of the chatrooms was also carried out and the community functions and characteristics of each chatroom explored. The findings of this research have indicated several opportunities for further research. As a result of this research, recommendations are given on policy, prevention and response strategies with regards to online predators.
Resumo:
Artists: Donna Hewitt, Julian Knowles, Wade Marynowsky, Tim Bruniges, Avril Huddy Macrophonics presents new Australian work emerging from the leading edge of where performance interface research is taking place. The program addresses the emerging dialogue between traditional media and emerging digital media, as well as the dialogue across a broad range of musical traditions. Due to recent technological developments, we have reached a point artistically where the relationships between media and genres are being completely re-evaluated. This program presents a cross-section of responses to this condition. Each of the works in the program foregrounds an approach to performance that integrates sensors and novel performance control devices and/or examine how machines can be made musical in performance. Containing works for voice, electronics, video, movement and sensor based gestural controllers, it critically surveys the interface between humans and machines in performance. From sensor based microphones and guitars, performance a/v, to post-rock dronescapes and experimental electronica; Macrophonics provides a broad and engaging survey of new performance approaches in mediatised environments.
Resumo:
This paper describes an interactive installation work set in a large dome space. The installation is an audio and physical re-rendition of an interactive writing work. In the original work, the user interacted via keyboard and screen while online. This rendition of the work retains the online interaction, but also places the interaction within a physical space, where the main 'conversation' takes place by the participant-audience speaking through microphones and listening through headphones. The work now also includes voice and SMS input, using speech-to-text and text-to-speech conversion technologies, and audio and displayed text for output. These additions allow the participant-audience to co-author the work while they participate in audible conversation with keyword-triggering characters (bots). Communication in the space can be person-to-computer via microphone, keyboard, and phone; person-to-person via machine and within the physical space; computer-to- computer; and computer-to-person via audio and projected text.
Resumo:
Macrophonics II presents new Australian work emerging from the leading edge of performance interface research. The program addresses the emerging dialogue between traditional media and emerging digital media, as well as dialogues across a broad range of musical traditions. Recent technological developments are causing a complete reevaluation of the relationships between media and genres in art, and Macrophonics II presents a cross-section of responses to this situation. Works in the program foreground an approach to performance that integrates sensors with novel performance control devices, and/or examine how machines can be made musical in performance. The program presents works by Australian artists Donna Hewitt, Julian Knowles and Wade Marynowsky, with choreography by Avril Huddy and dance performance by Lizzie and Zaimon Vilmanis. From sensor-based microphones and guitars, through performance a/v, to post-rock dronescapes, movement inspired works and experimental electronica, Macrophonics II provides a broad and engaging survey of new performance approaches in mediatised environments. Initial R&D for the work was supported by a range of institutions internationally, including the Australia Council for the Arts, Arts Queensland, STEIM (Holland) and the Nes Artist Residency (Iceland).
Resumo:
An investigation into the spatial distribution of road traffic noise levels on a balcony is conducted. A balcony constructed to a special acoustic design due to its elevation above an 8 lane motorway is selected for detailed measurements. The as-constructed balcony design includes solid parapets, side walls, ceiling shields and highly absorptive material placed on the ceiling. Road traffic noise measurements are conducted spatially using a five channel acoustic analyzer, where four microphones are located at various positions within the balcony space and one microphone placed outside the parapet at a reference position. Spatial distributions in both vertical and horizontal planes are measured. A theoretical model and prediction configuration is presented that assesses the acoustic performance of the balcony under existing traffic flow conditions. The prediction model implements a combined direct path, specular reflection path and diffuse reflection path utilizing image source and radiosity techniques. Results obtained from the prediction model are presented and compared to the measurement results. The predictions are found to correlate well with measurements with some minor differences that are explained. It is determined that the prediction methodology is acceptable to assess a wider range of street and balcony configuration scenarios.
Resumo:
Automatic speech recognition from multiple distant micro- phones poses significant challenges because of noise and reverberations. The quality of speech acquisition may vary between microphones because of movements of speakers and channel distortions. This paper proposes a channel selection approach for selecting reliable channels based on selection criterion operating in the short-term modulation spectrum domain. The proposed approach quantifies the relative strength of speech from each microphone and speech obtained from beamforming modulations. The new technique is compared experimentally in the real reverb conditions in terms of perceptual evaluation of speech quality (PESQ) measures and word error rate (WER). Overall improvement in recognition rate is observed using delay-sum and superdirective beamformers compared to the case when the channel is selected randomly using circular microphone arrays.