991 resultados para audio-frequency
Resumo:
Environmental changes have put great pressure on biological systems leading to the rapid decline of biodiversity. To monitor this change and protect biodiversity, animal vocalizations have been widely explored by the aid of deploying acoustic sensors in the field. Consequently, large volumes of acoustic data are collected. However, traditional manual methods that require ecologists to physically visit sites to collect biodiversity data are both costly and time consuming. Therefore it is essential to develop new semi-automated and automated methods to identify species in automated audio recordings. In this study, a novel feature extraction method based on wavelet packet decomposition is proposed for frog call classification. After syllable segmentation, the advertisement call of each frog syllable is represented by a spectral peak track, from which track duration, dominant frequency and oscillation rate are calculated. Then, a k-means clustering algorithm is applied to the dominant frequency, and the centroids of clustering results are used to generate the frequency scale for wavelet packet decomposition (WPD). Next, a new feature set named adaptive frequency scaled wavelet packet decomposition sub-band cepstral coefficients is extracted by performing WPD on the windowed frog calls. Furthermore, the statistics of all feature vectors over each windowed signal are calculated for producing the final feature set. Finally, two well-known classifiers, a k-nearest neighbour classifier and a support vector machine classifier, are used for classification. In our experiments, we use two different datasets from Queensland, Australia (18 frog species from commercial recordings and field recordings of 8 frog species from James Cook University recordings). The weighted classification accuracy with our proposed method is 99.5% and 97.4% for 18 frog species and 8 frog species respectively, which outperforms all other comparable methods.
Resumo:
The interest in low bit rate video coding has increased considerably. Despite rapid progress in storage density and digital communication system performance, demand for data-transmission bandwidth and storage capacity continue to exceed the capabilities of available technologies. The growth of data-intensive digital audio, video applications and the increased use of bandwidth-limited media such as video conferencing and full motion video have not only sustained the need for efficient ways to encode analog signals, but made signal compression central to digital communication and data-storage technology. In this paper we explore techniques for compression of image sequences in a manner that optimizes the results for the human receiver. We propose a new motion estimator using two novel block match algorithms which are based on human perception. Simulations with image sequences have shown an improved bit rate while maintaining ''image quality'' when compared to conventional motion estimation techniques using the MAD block match criteria.
Resumo:
Pre-whitening techniques are employed in blind correlation detection of additive spread spectrum watermarks in audio signals to reduce the host signal interference. A direct deterministic whitening (DDW) scheme is derived in this paper from the frequency domain analysis of the time domain correlation process. Our experimental studies reveal that, the Savitzky-Golay Whitening (SGW), which is otherwise inferior to DDW technique, performs better when the audio signal is predominantly lowpass. The novelty of this paper lies in exploiting the complementary nature to the two whitening techniques to obtain a hybrid whitening (HbW) scheme. In the hybrid scheme the DDW and SGW techniques are selectively applied, based on short time spectral characteristics of the audio signal. The hybrid scheme extends the reliability of watermark detection to a wider range of audio signals.
Resumo:
This paper presents speaker normalization approaches for audio search task. Conventional state-of-the-art feature set, viz., Mel Frequency Cepstral Coefficients (MFCC) is known to contain speaker-specific and linguistic information implicitly. This might create problem for speaker-independent audio search task. In this paper, universal warping-based approach is used for vocal tract length normalization in audio search. In particular, features such as scale transform and warped linear prediction are used to compensate speaker variability in audio matching. The advantage of these features over conventional feature set is that they apply universal frequency warping for both the templates to be matched during audio search. The performance of Scale Transform Cepstral Coefficients (STCC) and Warped Linear Prediction Cepstral Coefficients (WLPCC) are about 3% higher than the state-of-the-art MFCC feature sets on TIMIT database.
Resumo:
We present a statistical model-based approach to signal enhancement in the case of additive broadband noise. Because broadband noise is localised in neither time nor frequency, its removal is one of the most pervasive and difficult signal enhancement tasks. In order to improve perceived signal quality, we take advantage of human perception and define a best estimate of the original signal in terms of a cost function incorporating perceptual optimality criteria. We derive the resultant signal estimator and implement it in a short-time spectral attenuation framework. Audio examples, references, and further information may be found at http://www-sigproc.eng.cam.ac.uk/~pjw47.
Resumo:
This paper proposes a Bayesian method for polyphonic music description. The method first divides an input audio signal into a series of sections called snapshots, and then estimates parameters such as fundamental frequencies and amplitudes of the notes contained in each snapshot. The parameter estimation process is based on a frequency domain modelling and Gibbs sampling. Experimental results obtained from audio signals of test note patterns are encouraging; the accuracy is better than 80% for the estimation of fundamental frequencies in terms of semitones and instrument names when the number of simultaneous notes is two.
Resumo:
A novel design of a moving-coil transducer coupled with a low-hardness elastomer called “the gel surround” is presented in this thesis. This device is termed a “gel-type audio transducer”. The gel-type audio transducer has been developed to overcome the problems that conventional loudspeakers have suffered - that is, the problem with size of the audio device against the quality of sound at low frequency range. Therefore the research work presented herein aims to develop the “gel-type audio transducer” as a next-generation audio transducer for miniaturized woofers. The gel-type audio transducer consists of the magnetic and coil-drive plate assembly, and these parts are coupled by the gel surround. The transducer is driven by the electromagnetic conversion mechanism (a moving-coil transducer) and its output driving force can be greatly enhanced by applying the novel mechanism of the gel surround especially at low frequency range, resulting in the enhanced acoustic efficiency. The transducer can be attached to a stiff and light panel with both the optimized impedance matching and minimised wave collisions. The performance of the gel-type audio transducer is greatly influenced by the mass of the magnetic assembly and compliance of the “gel surround”. But as the size of the magnet and its weight have to be kept minimal for a miniaturisation of the device, the focus of the research is on the effect of the of the gel surround. As a result, the effect of the gel surround, made of the RTV (room-temperature vulcanising) silicone elastomer, TPE (thermoplastic elastomer), and the silicone foam, on generation of the output driving force, the energy transfer from the transducer to a panel to which the transducer is attached, and sound radiation from the vibrating panel, was investigated. This effect was studied by COMSOL multiphysics (FE analysis) and thereby, the simulated results were verified by experiments such as the laser scanning measurement, DMA (dynamic mechanical analyzer), and the acoustic test. Successful development of prototypes of the gel-type audio transducers, with an enhanced acoustic efficiency at reduced size and weight, was achieved. Implementation of the transducers into consumer applications was also demonstrated with their commercial values.
Resumo:
A new method for modeling-frequency-dependent boundaries in finite-difference time-domain (FDTD) and Kirchhoff variable digital waveguide mesh (K-DWM) room acoustics simulations is presented. The proposed approach allows the direct incorporation of a digital impedance filter (DIF) in the Multidimensional (2D or 3D) FDTD boundary model of a locally reacting surface. An explicit boundary update equation is obtained by carefully constructing a Suitable recursive formulation. The method is analyzed in terms of pressure wave reflectance for different wall impedance filters and angles of incidence. Results obtained from numerical experiments confirm the high accuracy of the proposed digital impedance filter boundary model, the reflectance of which matches locally reacting surface (LRS) theory closely. Furthermore a numerical boundary analysis (NBA) formula is provided as a technique for an analytic evaluation of the numerical reflectance of the proposed digital impedance filter boundary formulation.
Resumo:
We propose a frequency domain adaptive algorithm for
wave separation in wind instruments. Forward and backward travelling waves are obtained from the signals acquired by two microphones placed along the tube, while the
separation ?lter is adapted from the information given by a
third microphone. Working in the frequency domain has a
series of advantages, among which are the ease of design of
the propagation ?lter and its differentiation with respect to
its parameters.
Although the adaptive algorithm was developed as a ?rst
step for the estimation of playing parameters in wind instruments it can also be used, without any modi?cations, for
other applications such as in-air direction of arrival (DOA)
estimation. Preliminary results on these applications will
also be presented.
Resumo:
Machine tool chatter is an unfavorable phenomenon during metal cutting, which results in heavy vibration of cutting tool. With increase in depth of cut, the cutting regime changes from chatter-free cutting to one with chatter. In this paper, we propose the use of permutation entropy (PE), a conceptually simple and computationally fast measurement to detect the onset of chatter from the time series using sound signal recorded with a unidirectional microphone. PE can efficiently distinguish the regular and complex nature of any signal and extract information about the dynamics of the process by indicating sudden change in its value. Under situations where the data sets are huge and there is no time for preprocessing and fine-tuning, PE can effectively detect dynamical changes of the system. This makes PE an ideal choice for online detection of chatter, which is not possible with other conventional nonlinear methods. In the present study, the variation of PE under two cutting conditions is analyzed. Abrupt variation in the value of PE with increase in depth of cut indicates the onset of chatter vibrations. The results are verified using frequency spectra of the signals and the nonlinear measure, normalized coarse-grained information rate (NCIR).
Resumo:
Frequency recognition is an important task in many engineering fields such as audio signal processing and telecommunications engineering, for example in applications like Dual-Tone Multi-Frequency (DTMF) detection or the recognition of the carrier frequency of a Global Positioning, System (GPS) signal. This paper will present results of investigations on several common Fourier Transform-based frequency recognition algorithms implemented in real time on a Texas Instruments (TI) TMS320C6713 Digital Signal Processor (DSP) core. In addition, suitable metrics are going to be evaluated in order to ascertain which of these selected algorithms is appropriate for audio signal processing(1).