10 resultados para Song, Android, Applicazione, Registrazioni Audio.
em Indian Institute of Science - Bangalore - Índia
Resumo:
Communication applications are usually delay restricted, especially for the instance of musicians playing over the Internet. This requires a one-way delay of maximum 25 msec and also a high audio quality is desired at feasible bit rates. The ultra low delay (ULD) audio coding structure is well suited to this application and we investigate further the application of multistage vector quantization (MSVQ) to reach a bit rate range below 64 Kb/s, in a scalable manner. Results at 32 Kb/s and 64 Kb/s show that the trained codebook MSVQ performs best, better than KLT normalization followed by a simulated Gaussian MSVQ or simulated Gaussian MSVQ alone. The results also show that there is only a weak dependence on the training data, and that we indeed converge to the perceptual quality of our previous ULD coder at 64 Kb/s.
Resumo:
We propose a parametric stereo coding analysis and synthesis directly in the MDCT domain using an analysis by synthesis parameter estimation. The stereo signal is represented by an equalized sum signal and spatialization parameters. Equalized sum signal and the spatialization parameters are obtained by sub-band analysis in the MDCT domain. The de-correlated signal required for the stereo synthesis is also generated in the MDCT domain. Subjective evaluation test using MUSHRA shows that the synthesized stereo signal is perceptually satisfactory and comparable to the state of the art parametric coders.
Resumo:
Pre-whitening techniques are employed in blind correlation detection of additive spread spectrum watermarks in audio signals to reduce the host signal interference. A direct deterministic whitening (DDW) scheme is derived in this paper from the frequency domain analysis of the time domain correlation process. Our experimental studies reveal that, the Savitzky-Golay Whitening (SGW), which is otherwise inferior to DDW technique, performs better when the audio signal is predominantly lowpass. The novelty of this paper lies in exploiting the complementary nature to the two whitening techniques to obtain a hybrid whitening (HbW) scheme. In the hybrid scheme the DDW and SGW techniques are selectively applied, based on short time spectral characteristics of the audio signal. The hybrid scheme extends the reliability of watermark detection to a wider range of audio signals.
Changing resonator geometry to boost sound power decouples size and song frequency in a small insect
Resumo:
Despite their small size, some insects, such as crickets, can produce high amplitude mating songs by rubbing their wings together. By exploiting structural resonance for sound radiation, crickets broadcast species-specific songs at a sharply tuned frequency. Such songs enhance the range of signal transmission, contain information about the signaler's quality, and allow mate choice. The production of pure tones requires elaborate structural mechanisms that control and sustain resonance at the species-specific frequency. Tree crickets differ sharply from this scheme. Although they use a resonant system to produce sound, tree crickets can produce high amplitude songs at different frequencies, varying by as much as an octave. Based on an investigation of the driving mechanism and the resonant system, using laser Doppler vibrometry and finite element modeling, we show that it is the distinctive geometry of the crickets' forewings (the resonant system) that is responsible for their capacity to vary frequency. The long, enlarged wings enable the production of high amplitude songs; however, as a mechanical consequence of the high aspect ratio, the resonant structures have multiple resonant modes that are similar in frequency. The drive produced by the singing apparatus cannot, therefore, be locked to a single frequency, and different resonant modes can easily be engaged, allowing individual males to vary the carrier frequency of their songs. Such flexibility in sound production, decoupling body size and song frequency, has important implications for conventional views of mate choice, and offers inspiration for the design of miniature, multifrequency, resonant acoustic radiators.
Resumo:
We propose an iterative algorithm to detect transient segments in audio signals. Short time Fourier transform(STFT) is used to detect rapid local changes in the audio signal. The algorithm has two steps that iteratively - (a) calculate a function of the STFT and (b) build a transient signal. A dynamic thresholding scheme is used to locate the potential positions of transients in the signal. The iterative procedure ensures that genuine transients are built up while the localised spectral noise are suppressed by using an energy criterion. The extracted transient signal is later compared to a ground truth dataset. The algorithm performed well on two databases. On the EBU-SQAM database of monophonic sounds, the algorithm achieved an F-measure of 90% while on our database of polyphonic audio an F-measure of 91% was achieved. This technique is being used as a preprocessing step for a tempo analysis algorithm and a TSR (Transients + Sines + Residue) decomposition scheme.
Resumo:
We address the problem of temporal envelope modeling for transient audio signals. We propose the Gamma distribution function (GDF) as a suitable candidate for modeling the envelope keeping in view some of its interesting properties such as asymmetry, causality, near-optimal time-bandwidth product, controllability of rise and decay, etc. The problem of finding the parameters of the GDF becomes a nonlinear regression problem. We overcome the hurdle by using a logarithmic envelope fit, which reduces the problem to one of linear regression. The logarithmic transformation also has the feature of dynamic range compression. Since temporal envelopes of audio signals are not uniformly distributed, in order to compute the amplitude, we investigate the importance of various loss functions for regression. Based on synthesized data experiments, wherein we have a ground truth, and real-world signals, we observe that the least-squares technique gives reasonably accurate amplitude estimates compared with other loss functions.
Resumo:
Programming environments for smartphones expose a concurrency model that combines multi-threading and asynchronous event-based dispatch. While this enables the development of efficient and feature-rich applications, unforeseen thread interleavings coupled with non-deterministic reorderings of asynchronous tasks can lead to subtle concurrency errors in the applications. In this paper, we formalize the concurrency semantics of the Android programming model. We further define the happens-before relation for Android applications, and develop a dynamic race detection technique based on this relation. Our relation generalizes the so far independently studied happens-before relations for multi-threaded programs and single-threaded event-driven programs. Additionally, our race detection technique uses a model of the Android runtime environment to reduce false positives. We have implemented a tool called DROIDRACER. It generates execution traces by systematically testing Android applications and detects data races by computing the happens-before relation on the traces. We analyzed 1 5 Android applications including popular applications such as Facebook, Twitter and K-9 Mail. Our results indicate that data races are prevalent in Android applications, and that DROIDRACER is an effective tool to identify data races.
Resumo:
In this paper, we propose a new state transition based embedding (STBE) technique for audio watermarking with high fidelity. Furthermore, we propose a new correlation based encoding (CBE) scheme for binary logo image in order to enhance the payload capacity. The result of CBE is also compared with standard run-length encoding (RLE) compression and Huffman schemes. Most of the watermarking algorithms are based on modulating selected transform domain feature of an audio segment in order to embed given watermark bit. In the proposed STBE method instead of modulating feature of each and every segment to embed data, our aim is to retain the default value of this feature for most of the segments. Thus, a high quality of watermarked audio is maintained. Here, the difference between the mean values (Mdiff) of insignificant complex cepstrum transform (CCT) coefficients of down-sampled subsets is selected as a robust feature for embedding. Mdiff values of the frames are changed only when certain conditions are met. Hence, almost 50% of the times, segments are not changed and still STBE can convey watermark information at receiver side. STBE also exhibits a partial restoration feature by which the watermarked audio can be restored partially after extraction of the watermark at detector side. The psychoacoustic model analysis showed that the noise-masking ratio (NMR) of our system is less than -10dB. As amplitude scaling in time domain does not affect selected insignificant CCT coefficients, strong invariance towards amplitude scaling attacks is also proved theoretically. Experimental results reveal that the proposed watermarking scheme maintains high audio quality and are simultaneously robust to general attacks like MP3 compression, amplitude scaling, additive noise, re-quantization, etc.
Resumo:
This paper presents speaker normalization approaches for audio search task. Conventional state-of-the-art feature set, viz., Mel Frequency Cepstral Coefficients (MFCC) is known to contain speaker-specific and linguistic information implicitly. This might create problem for speaker-independent audio search task. In this paper, universal warping-based approach is used for vocal tract length normalization in audio search. In particular, features such as scale transform and warped linear prediction are used to compensate speaker variability in audio matching. The advantage of these features over conventional feature set is that they apply universal frequency warping for both the templates to be matched during audio search. The performance of Scale Transform Cepstral Coefficients (STCC) and Warped Linear Prediction Cepstral Coefficients (WLPCC) are about 3% higher than the state-of-the-art MFCC feature sets on TIMIT database.