989 resultados para Acoustic MIMO Speaker Microphone Array
Resumo:
Amphibian is an 10’00’’ musical work which explores new musical interfaces and approaches to hybridising performance practices from the popular music, electronic dance music and computer music traditions. The work is designed to be presented in a range of contexts associated with the electro-acoustic, popular and classical music traditions. The work is for two performers using two synchronised laptops, an electric guitar and a custom designed gestural interface for vocal performers - the e-Mic (Extended Mic-stand Interface Controller). This interface was developed by one of the co-authors, Donna Hewitt. The e-Mic allows a vocal performer to manipulate the voice in real time through the capture of physical gestures via an array of sensors - pressure, distance, tilt - along with ribbon controllers and an X-Y joystick microphone mount. Performance data are then sent to a computer, running audio-processing software, which is used to transform the audio signal from the microphone. In this work, data is also exchanged between performers via a local wireless network, allowing performers to work with shared data streams. The duo employs the gestural conventions of guitarist and singer (i.e. 'a band' in a popular music context), but transform these sounds and gestures into new digital music. The gestural language of popular music is deliberately subverted and taken into a new context. The piece thus explores the nexus between the sonic and performative practices of electro acoustic music and intelligent electronic dance music (‘idm’). This work was situated in the research fields of new musical interfacing, interaction design, experimental music composition and performance. The contexts in which the research was conducted were live musical performance and studio music production. The work investigated new methods for musical interfacing, performance data mapping, hybrid performance and compositional practices in electronic music. The research methodology was practice-led. New insights were gained from the iterative experimental workshopping of gestural inputs, musical data mapping, inter-performer data exchange, software patch design, data and audio processing chains. In respect of interfacing, there were innovations in the design and implementation of a novel sensor-based gestural interface for singers, the e-Mic, one of the only existing gestural controllers for singers. This work explored the compositional potential of sharing real time performance data between performers and deployed novel methods for inter-performer data exchange and mapping. As regards stylistic and performance innovation, the work explored and demonstrated an approach to the hybridisation of the gestural and sonic language of popular music with recent ‘post-digital’ approaches to laptop based experimental music The development of the work was supported by an Australia Council Grant. Research findings have been disseminated via a range of international conference publications, recordings, radio interviews (ABC Classic FM), broadcasts, and performances at international events and festivals. The work was curated into the major Australian international festival, Liquid Architecture, and was selected by an international music jury (through blind peer review) for presentation at the International Computer Music Conference in Belfast, N. Ireland.
Resumo:
Nodule is 19'54" musical work for two electronic music performers, two laptop computers and a custom built, sensor-based microphone controller - the e-Mic (Extended Mic-stand Interface Controller). This interface was developed by one of the co-authors, Donna Hewitt. The e-Mic allows a vocal performer to manipulate their voice in real time by capturing physical gestures via an array of sensors - pressure, distance, tilt – in addition to ribbon controllers and an X-Y joystick microphone mount. Performance data are then sent to a computer, running audio-processing software, which is used to transform the audio signal from the microphone in real time. The work seeks to explore the liminal space between the electro-acoustic music tradition and more recent developments in the electronic dance music tradition. It does so on both a performative (gestural) and compositional (sonic) level. Visually, the performance consists of a singer and a laptop performer, hybridising the gestural context of these traditions. On a sonic level, the work explores hybridity at deeper levels of the musical structure than simple bricolage or collage approaches. Hybridity is explored at the level of the sonic gesture (source material), in production (audio processing gestures), in performance gesture, and in approaches to the use of the frequency spectrum, pulse and meter. The work was designed to be performed in a range of contexts from concert halls, to clubs, to rock festivals, across a range of staging and production platforms. As a consequence, the work has been tested in a range of audience contexts, and has allowed the transportation of compositional and performance practices across traditional audience demographic boundaries.
Resumo:
Channel measurements and simulations have been carried out to observe the effects of pedestrian movement on multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) channel capacity. An in-house built MIMO-OFDM packet transmission demonstrator equipped with four transmitters and four receivers has been utilized to perform channel measurements at 5.2 GHz. Variations in the channel capacity dynamic range have been analysed for 1 to 10 pedestrians and different antenna arrays (2 × 2, 3 × 3 and 4 × 4). Results show a predicted 5.5 bits/s/Hz and a measured 1.5 bits/s/Hz increment in the capacity dynamic range with the number of pedestrian and the number of antennas in the transmitter and receiver array.
Resumo:
We investigate Multiple-Input and Multiple-Output Orthogonal Frequency Division Multiplexing (MIMO-OFDM) systems behavior in indoor populated environments that have line-of-site (LoS) between transmitter and receiver arrays. The in-house built MIMO-OFDM packet transmission demonstrator, equipped with four transmitters and four receivers, has been utilized to perform channel measurements at 5.2 GHz. Measurements have been performed using 0 to 3 pedestrians with different antenna arrays (2 £ 2, 3 £ 3 and 4 £ 4). The maximum average capacity for the 2x2 deterministic Fixed SNR scenario is 8.5 dB compared to the 4x4 deterministic scenario that has a maximum average capacity of 16.2 dB, thus an increment of 8 dB in average capacity has been measured when the array size increases from 2x2 to 4x4. In addition a regular variation has been observed for Random scenarios compared to the deterministic scenarios. An incremental trend in average channel capacity for both deterministic and random pedestrian movements has been observed with increasing number of pedestrian and antennas. In deterministic scenarios, the variations in average channel capacity are more noticeable than for the random scenarios due to a more prolonged and controlled body-shadowing effect. Moreover due to the frequent Los blocking and fixed transmission power a slight decrement have been observed in the spread between the maximum and minimum capacity with random fixed Tx power scenario.
Resumo:
The use of the PC and Internet for placing telephone calls will present new opportunities to capture vast amounts of un-transcribed speech for a particular speaker. This paper investigates how to best exploit this data for speaker-dependent speech recognition. Supervised and unsupervised experiments in acoustic model and language model adaptation are presented. Using one hour of automatically transcribed speech per speaker with a word error rate of 36.0%, unsupervised adaptation resulted in an absolute gain of 6.3%, equivalent to 70% of the gain from the supervised case, with additional adaptation data likely to yield further improvements. LM adaptation experiments suggested that although there seems to be a small degree of speaker idiolect, adaptation to the speaker alone, without considering the topic of the conversation, is in itself unlikely to improve transcription accuracy.
Resumo:
In an automotive environment, the performance of a speech recognition system is affected by environmental noise if the speech signal is acquired directly from a microphone. Speech enhancement techniques are therefore necessary to improve the speech recognition performance. In this paper, a field-programmable gate array (FPGA) implementation of dual-microphone delay-and-sum beamforming (DASB) for speech enhancement is presented. As the first step towards a cost-effective solution, the implementation described in this paper uses a relatively high-end FPGA device to facilitate the verification of various design strategies and parameters. Experimental results show that the proposed design can produce output waveforms close to those generated by a theoretical (floating-point) model with modest usage of FPGA resources. Speech recognition experiments are also conducted on enhanced in-car speech waveforms produced by the FPGA in order to compare recognition performance with the floating-point representation running on a PC.
Resumo:
Automatic spoken Language Identi¯cation (LID) is the process of identifying the language spoken within an utterance. The challenge that this task presents is that no prior information is available indicating the content of the utterance or the identity of the speaker. The trend of globalization and the pervasive popularity of the Internet will amplify the need for the capabilities spoken language identi¯ca- tion systems provide. A prominent application arises in call centers dealing with speakers speaking di®erent languages. Another important application is to index or search huge speech data archives and corpora that contain multiple languages. The aim of this research is to develop techniques targeted at producing a fast and more accurate automatic spoken LID system compared to the previous National Institute of Standards and Technology (NIST) Language Recognition Evaluation. Acoustic and phonetic speech information are targeted as the most suitable fea- tures for representing the characteristics of a language. To model the acoustic speech features a Gaussian Mixture Model based approach is employed. Pho- netic speech information is extracted using existing speech recognition technol- ogy. Various techniques to improve LID accuracy are also studied. One approach examined is the employment of Vocal Tract Length Normalization to reduce the speech variation caused by di®erent speakers. A linear data fusion technique is adopted to combine the various aspects of information extracted from speech. As a result of this research, a LID system was implemented and presented for evaluation in the 2003 Language Recognition Evaluation conducted by the NIST.
Resumo:
The cascading appearance-based (CAB) feature extraction technique has established itself as the state-of-the-art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we will demonstrate that the same steps taken to reduce static speaker and environmental information for the visual speech recognition application also provide similar improvements for visual speaker recognition. A further study is conducted comparing synchronous HMM (SHMM) based fusion of CAB visual features and traditional perceptual linear predictive (PLP) acoustic features to show that higher complexity inherit in the SHMM approach does not appear to provide any improvement in the final audio-visual speaker verification system over simpler utterance level score fusion.
Resumo:
The rapid growth of mobile telephone use, satellite services, and now the wireless Internet and WLANs are generating tremendous changes in telecommunication and networking. As indoor wireless communications become more prevalent, modeling indoor radio wave propagation in populated environments is a topic of significant interest. Wireless MIMO communication exploits phenomena such as multipath propagation to increase data throughput and range, or reduce bit error rates, rather than attempting to eliminate effects of multipath propagation as traditional SISO communication systems seek to do. The MIMO approach can yield significant gains for both link and network capacities, with no additional transmitting power or bandwidth consumption when compared to conventional single-array diversity methods. When MIMO and OFDM systems are combined and deployed in a suitable rich scattering environment such as indoors, a significant capacity gain can be observed due to the assurance of multipath propagation. Channel variations can occur as a result of movement of personnel, industrial machinery, vehicles and other equipment moving within the indoor environment. The time-varying effects on the propagation channel in populated indoor environments depend on the different pedestrian traffic conditions and the particular type of environment considered. A systematic measurement campaign to study pedestrian movement effects in indoor MIMO-OFDM channels has not yet been fully undertaken. Measuring channel variations caused by the relative positioning of pedestrians is essential in the study of indoor MIMO-OFDM broadband wireless networks. Theoretically, due to high multipath scattering, an increase in MIMO-OFDM channel capacity is expected when pedestrians are present. However, measurements indicate that some reductions in channel capacity could be observed as the number of pedestrians approaches 10 due to a reduction in multipath conditions as more human bodies absorb the wireless signals. This dissertation presents a systematic characterization of the effects of pedestrians in indoor MIMO-OFDM channels. Measurement results, using the MIMO-OFDM channel sounder developed at the CSIRO ICT Centre, have been validated by a customized Geometric Optics-based ray tracing simulation. Based on measured and simulated MIMO-OFDM channel capacity and MIMO-OFDM capacity dynamic range, an improved deterministic model for MIMO-OFDM channels in indoor populated environments is presented. The model can be used for the design and analysis of future WLAN to be deployed in indoor environments. The results obtained show that, in both Fixed SNR and Fixed Tx for deterministic condition, the channel capacity dynamic range rose with the number of pedestrians as well as with the number of antenna combinations. In random scenarios with 10 pedestrians, an increment in channel capacity of up to 0.89 bits/sec/Hz in Fixed SNR and up to 1.52 bits/sec/Hz in Fixed Tx has been recorded compared to the one pedestrian scenario. In addition, from the results a maximum increase in average channel capacity of 49% has been measured while 4 antenna elements are used, compared with 2 antenna elements. The highest measured average capacity, 11.75 bits/sec/Hz, corresponds to the 4x4 array with 10 pedestrians moving randomly. Moreover, Additionally, the spread between the highest and lowest value of the the dynamic range is larger for Fixed Tx, predicted 5.5 bits/sec/Hz and measured 1.5 bits/sec/Hz, in comparison with Fixed SNR criteria, predicted 1.5 bits/sec/Hz and measured 0.7 bits/sec/Hz. This has been confirmed by both measurements and simulations ranging from 1 to 5, 7 and 10 pedestrians.
Resumo:
This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms the performance of either sub-system. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise
Resumo:
Pedestrians’ use of mp3 players or mobile phones can pose the risk of being hit by motor vehicles. We present an approach for detecting a crash risk level using the computing power and the microphone of mobile devices that can be used to alert the user in advance of an approaching vehicle so as to avoid a crash. A single feature extractor classifier is not usually able to deal with the diversity of risky acoustic scenarios. In this paper, we address the problem of detection of vehicles approaching a pedestrian by a novel, simple, non resource intensive acoustic method. The method uses a set of existing statistical tools to mine signal features. Audio features are adaptively thresholded for relevance and classified with a three component heuristic. The resulting Acoustic Hazard Detection (AHD) system has a very low false positive detection rate. The results of this study could help mobile device manufacturers to embed the presented features into future potable devices and contribute to road safety.
Resumo:
Smart antenna receiver and transmitter systems consist of multi-port arrays with an individual receiver channel (including ADC) and an individual transmitter channel (including DAC)at every of the M antenna ports, respectively. By means of digital beamforming, an unlimited number of simultaneous complex-valued vector radiation patterns with M-1 degrees of freedom can be formed. Applications of smart antennas in communication systems include space-division multiple access. If both stations of a communication link are equipped with smart antennas (multiple-input-multiple-output, MIMO). multiple independent channels can be formed in a "multi-path-rich" environment. In this article, it will be shown that under certain circumstances, the correlation between signals from adjacent ports of a dense array (M + ΔM elements) can be kept as low as the correlation between signals from adjacent ports of a conventional array (M elements and half-wavelength pacing). This attractive feature is attained by means of a novel approach which employs a RF decoupling network at the array ports in order to form new ports which are decoupled and associated with mutually orthogonal (de-correlated) radiation patterns.
Resumo:
This paper investigates advanced channel compensation techniques for the purpose of improving i-vector speaker verification performance in the presence of high intersession variability using the NIST 2008 and 2010 SRE corpora. The performance of four channel compensation techniques: (a) weighted maximum margin criterion (WMMC), (b) source-normalized WMMC (SN-WMMC), (c) weighted linear discriminant analysis (WLDA), and; (d) source-normalized WLDA (SN-WLDA) have been investigated. We show that, by extracting the discriminatory information between pairs of speakers as well as capturing the source variation information in the development i-vector space, the SN-WLDA based cosine similarity scoring (CSS) i-vector system is shown to provide over 20% improvement in EER for NIST 2008 interview and microphone verification and over 10% improvement in EER for NIST 2008 telephone verification, when compared to SN-LDA based CSS i-vector system. Further, score-level fusion techniques are analyzed to combine the best channel compensation approaches, to provide over 8% improvement in DCF over the best single approach, (SN-WLDA), for NIST 2008 interview/ telephone enrolment-verification condition. Finally, we demonstrate that the improvements found in the context of CSS also generalize to state-of-the-art GPLDA with up to 14% relative improvement in EER for NIST SRE 2010 interview and microphone verification and over 7% relative improvement in EER for NIST SRE 2010 telephone verification.
Resumo:
An investigation into the spatial distribution of road traffic noise levels on a balcony is conducted. A balcony constructed to a special acoustic design due to its elevation above an 8 lane motorway is selected for detailed measurements. The as-constructed balcony design includes solid parapets, side walls, ceiling shields and highly absorptive material placed on the ceiling. Road traffic noise measurements are conducted spatially using a five channel acoustic analyzer, where four microphones are located at various positions within the balcony space and one microphone placed outside the parapet at a reference position. Spatial distributions in both vertical and horizontal planes are measured. A theoretical model and prediction configuration is presented that assesses the acoustic performance of the balcony under existing traffic flow conditions. The prediction model implements a combined direct path, specular reflection path and diffuse reflection path utilizing image source and radiosity techniques. Results obtained from the prediction model are presented and compared to the measurement results. The predictions are found to correlate well with measurements with some minor differences that are explained. It is determined that the prediction methodology is acceptable to assess a wider range of street and balcony configuration scenarios.