926 resultados para Signal Processing, EMD, Thresholding, Acceleration, Displacement, Structural Identification
Resumo:
This study assesses the recently proposed data-driven background dataset refinement technique for speaker verification using alternate SVM feature sets to the GMM supervector features for which it was originally designed. The performance improvements brought about in each trialled SVM configuration demonstrate the versatility of background dataset refinement. This work also extends on the originally proposed technique to exploit support vector coefficients as an impostor suitability metric in the data-driven selection process. Using support vector coefficients improved the performance of the refined datasets in the evaluation of unseen data. Further, attempts are made to exploit the differences in impostor example suitability measures from varying features spaces to provide added robustness.
Resumo:
Information fusion in biometrics has received considerable attention. The architecture proposed here is based on the sequential integration of multi-instance and multi-sample fusion schemes. This method is analytically shown to improve the performance and allow a controlled trade-off between false alarms and false rejects when the classifier decisions are statistically independent. Equations developed for detection error rates are experimentally evaluated by considering the proposed architecture for text dependent speaker verification using HMM based digit dependent speaker models. The tuning of parameters, n classifiers and m attempts/samples, is investigated and the resultant detection error trade-off performance is evaluated on individual digits. Results show that performance improvement can be achieved even for weaker classifiers (FRR-19.6%, FAR-16.7%). The architectures investigated apply to speaker verification from spoken digit strings such as credit card numbers in telephone or VOIP or internet based applications.
Resumo:
Defibrillator is a 16’41” musical work for solo performer, laptop computer and electric guitar. The electric guitar is processed in real-time by digital signal processing network in software, with gestural control provided by a foot-operated pedal board. --------- The work is informed by a range of ideas from the genres of electroacoustic music, western art music, popular music and cinematic sound. It seeks to fluidly cross and hybridise musical practices from these diverse sonic traditions and to develop a compositional language that draws upon multiple genres, but at the same time resists the ability to be located within a singular genre. Musical structures and sonic markers which form genre are ruptured at strategic levels of the musical structure in order to allow for a cross flow of concepts between genres. The process of rupture is facilitated by the practical implementation of music and sound reception theories into the compositional process. -------- The piece exhibits the by-products of a composer born into a media saturated environment, drawing on a range of musical and sonic traditions, actively seeking to explore the liminal space in between these traditions. The project stems from the author's research interests in locating points of connection between traditions of experimentation in diverse musical and sonic traditions arising from the broad uptake of media technologies in the early 20th century.
Resumo:
Acoustically, car cabins are extremely noisy and as a consequence audio-only, in-car voice recognition systems perform poorly. As the visual modality is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem by using audio visual automatic speech recognition (AVASR). However, implementing AVASR requires a system being able to accurately locate and track the drivers face and lip area in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using the AVICAR [1] in-car database, we show that the Viola- Jones approach is a suitable method of locating and tracking the driver’s lips despite the visual variability of illumination and head pose for audio-visual speech recognition system.
Resumo:
This paper presents a novel matched rotation precoding (MRP) scheme to design a rate one space-frequency block code (SFBC) and a multirate SFBC for MIMO-OFDM systems with limited feedback. The proposed rate one MRP and multirate MRP can always achieve full transmit diversity and optimal system performance for arbitrary number of antennas, subcarrier intervals, and subcarrier groupings, with limited channel knowledge required by the transmit antennas. The optimization process of the rate one MRP is simple and easily visualized so that the optimal rotation angle can be derived explicitly, or even intuitively for some cases. The multirate MRP has a complex optimization process, but it has a better spectral efficiency and provides a relatively smooth balance between system performance and transmission rate. Simulations show that the proposed SFBC with MRP can overcome the diversity loss for specific propagation scenarios, always improve the system performance, and demonstrate flexible performance with large performance gain. Therefore the proposed SFBCs with MRP demonstrate flexibility and feasibility so that it is more suitable for a practical MIMO-OFDM system with dynamic parameters.
Resumo:
Non-driving related cognitive load and variations of emotional state may impact a driver’s capability to control a vehicle and introduces driving errors. Availability of reliable cognitive load and emotion detection in drivers would benefit the design of active safety systems and other intelligent in-vehicle interfaces. In this study, speech produced by 68 subjects while driving in urban areas is analyzed. A particular focus is on speech production differences in two secondary cognitive tasks, interactions with a co-driver and calls to automated spoken dialog systems (SDS), and two emotional states during the SDS interactions - neutral/negative. A number of speech parameters are found to vary across the cognitive/emotion classes. Suitability of selected cepstral- and production-based features for automatic cognitive task/emotion classification is investigated. A fusion of GMM/SVM classifiers yields an accuracy of 94.3% in cognitive task and 81.3% in emotion classification.
Resumo:
Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.
Resumo:
This paper investigates the impact of carrier frequency offset (CFO) on Single Carrier wireless communication systems with Frequency Domain Equalization (SC-FDE). We show that CFO in SC-FDE systems causes irrecoverable channel estimation error, which leads to inter-symbol-interference (ISI). The impact of CFO on SC-FDE and OFDM is compared in the presence of CFO and channel estimation errors. Closed form expressions of signal to interference and noise ratio (SINR) are derived for both systems, and verified by simulation results. We find that when channel estimation errors are considered, SC-FDE is similarly or even more sensitive to CFO, compared to OFDM. In particular, in SC-FDE systems, CFO mainly deteriorates the system performance via degrading the channel estimation. Both analytical and simulation results highlight the importance of accurate CFO estimation in SC-FDE systems.
Resumo:
In this chapter we propose clipping with amplitude and phase corrections to reduce the peak-to-average power ratio (PAR) of orthogonal frequency division multiplexed (OFDM) signals in high-speed wireless local area networks defined in IEEE 802.11a physical layer. The proposed techniques can be implemented with a small modification at the transmitter and the receiver remains standard compliant. PAR reduction as much as 4dB can be achieved by selecting a suitable clipping ratio and a correction factor depending on the constellation used. Out of band noise (OBN) is also reduced.
Resumo:
Parallel combinatory orthogonal frequency division multiplexing (PC-OFDM yields lower maximum peak-to-average power ratio (PAR), high bandwidth efficiency and lower bit error rate (BER) on Gaussian channels compared to OFDM systems. However, PC-OFDM does not improve the statistics of PAR significantly. In this chapter, the use of a set of fixed permutations to improve the statistics of the PAR of a PC-OFDM signal is presented. For this technique, interleavers are used to produce K-1 permuted sequences from the same information sequence. The sequence with the lowest PAR, among K sequences is chosen for the transmission. The PAR of a PC-OFDM signal can be further reduced by 3-4 dB by this technique. Mathematical expressions for the complementary cumulative density function (CCDF)of PAR of PC-OFDM signal and interleaved PC-OFDM signal are also presented.
Resumo:
Multiresolution techniques are being extensively used in signal processing literature. This paper has two parts, in the first part we derive a relationship between the general degradation model (Y=BX+W) at coarse and fine resolutions. In the second part we develop a signal restoration scheme in a multiresolution framework and demonstrate through experiments that the knowledge of the relationship between the degradation model at different resolutions helps in obtaining computationally efficient restoration scheme.
Resumo:
This paper introduces an energy-efficient Rate Adaptive MAC (RA-MAC) protocol for long-lived Wireless Sensor Networks (WSN). Previous research shows that the dynamic and lossy nature of wireless communication is one of the major challenges to reliable data delivery in a WSN. RA-MAC achieves high link reliability in such situations by dynamically trading off radio bit rate for signal processing gain. This extra gain reduces the packet loss rate which results in lower energy expenditure by reducing the number of retransmissions. RA-MAC selects the optimal data rate based on channel conditions with the aim of minimizing energy consumption. We have implemented RA-MAC in TinyOS on an off-the-shelf sensor platform (TinyNode), and evaluated its performance by comparing RA-MAC with state-ofthe- art WSN MAC protocol (SCP-MAC) by experiments.
Resumo:
Microphone arrays have been used in various applications to capture conversations, such as in meetings and teleconferences. In many cases, the microphone and likely source locations are known \emph{a priori}, and calculating beamforming filters is therefore straightforward. In ad-hoc situations, however, when the microphones have not been systematically positioned, this information is not available and beamforming must be achieved blindly. In achieving this, a commonly neglected issue is whether it is optimal to use all of the available microphones, or only an advantageous subset of these. This paper commences by reviewing different approaches to blind beamforming, characterising them by the way they estimate the signal propagation vector and the spatial coherence of noise in the absence of prior knowledge of microphone and speaker locations. Following this, a novel clustered approach to blind beamforming is motivated and developed. Without using any prior geometrical information, microphones are first grouped into localised clusters, which are then ranked according to their relative distance from a speaker. Beamforming is then performed using either the closest microphone cluster, or a weighted combination of clusters. The clustered algorithms are compared to the full set of microphones in experiments on a database recorded on different ad-hoc array geometries. These experiments evaluate the methods in terms of signal enhancement as well as performance on a large vocabulary speech recognition task.
Resumo:
This paper introduces a novel technique to directly optimise the Figure of Merit (FOM) for phonetic spoken term detection. The FOM is a popular measure of sTD accuracy, making it an ideal candiate for use as an objective function. A simple linear model is introduced to transform the phone log-posterior probabilities output by a phe classifier to produce enhanced log-posterior features that are more suitable for the STD task. Direct optimisation of the FOM is then performed by training the parameters of this model using a non-linear gradient descent algorithm. Substantial FOM improvements of 11% relative are achieved on held-out evaluation data, demonstrating the generalisability of the approach.
Resumo:
Intelligent surveillance systems typically use a single visual spectrum modality for their input. These systems work well in controlled conditions, but often fail when lighting is poor, or environmental effects such as shadows, dust or smoke are present. Thermal spectrum imagery is not as susceptible to environmental effects, however thermal imaging sensors are more sensitive to noise and they are only gray scale, making distinguishing between objects difficult. Several approaches to combining the visual and thermal modalities have been proposed, however they are limited by assuming that both modalities are perfuming equally well. When one modality fails, existing approaches are unable to detect the drop in performance and disregard the under performing modality. In this paper, a novel middle fusion approach for combining visual and thermal spectrum images for object tracking is proposed. Motion and object detection is performed on each modality and the object detection results for each modality are fused base on the current performance of each modality. Modality performance is determined by comparing the number of objects tracked by the system with the number detected by each mode, with a small allowance made for objects entering and exiting the scene. The tracking performance of the proposed fusion scheme is compared with performance of the visual and thermal modes individually, and a baseline middle fusion scheme. Improvement in tracking performance using the proposed fusion approach is demonstrated. The proposed approach is also shown to be able to detect the failure of an individual modality and disregard its results, ensuring performance is not degraded in such situations.