9 resultados para codebook

em Indian Institute of Science - Bangalore - Índia


Relevância:

10.00% 10.00%

Publicador:

Resumo:

We address the issue of complexity for vector quantization (VQ) of wide-band speech LSF (line spectrum frequency) parameters. The recently proposed switched split VQ (SSVQ) method provides better rate-distortion (R/D) performance than the traditional split VQ (SVQ) method, even at the requirement of lower computational complexity. but at the expense of much higher memory. We develop the two stage SVQ (TsSVQ) method, by which we gain both the memory and computational advantages and still retain good R/D performance. The proposed TsSVQ method uses a full dimensional quantizer in its first stage for exploiting all the higher dimensional coding advantages and then, uses an SVQ method for quantizing the residual vector in the second stage so as to reduce the complexity. We also develop a transform domain residual coding method in this two stage architecture such that it further reduces the computational complexity. To design an effective residual codebook in the second stage, variance normalization of Voronoi regions is carried out which leads to the design of two new methods, referred to as normalized two stage SVQ (NTsSVQ) and normalized two stage transform domain SVQ (NTsTrSVQ). These two new methods have complimentary strengths and hence, they are combined in a switched VQ mode which leads to the further improvement in R/D performance, but retaining the low complexity requirement. We evaluate the performances of new methods for wide-band speech LSF parameter quantization and show their advantages over established SVQ and SSVQ methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Compressive sensing (CS) has been proposed for signals with sparsity in a linear transform domain. We explore a signal dependent unknown linear transform, namely the impulse response matrix operating on a sparse excitation, as in the linear model of speech production, for recovering compressive sensed speech. Since the linear transform is signal dependent and unknown, unlike the standard CS formulation, a codebook of transfer functions is proposed in a matching pursuit (MP) framework for CS recovery. It is found that MP is efficient and effective to recover CS encoded speech as well as jointly estimate the linear model. Moderate number of CS measurements and low order sparsity estimate will result in MP converge to the same linear transform as direct VQ of the LP vector derived from the original signal. There is also high positive correlation between signal domain approximation and CS measurement domain approximation for a large variety of speech spectra.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The differential encoding/decoding setup introduced by Kiran et at, Oggier et al and Jing et al for wireless relay networks that use codebooks consisting of unitary matrices is extended to allow codebooks consisting of scaled unitary matrices. For such codebooks to be used in the Jing-Hassibi protocol for cooperative diversity, the conditions that need to be satisfied by the relay matrices and the codebook are identified. A class of previously known rate one, full diversity, four-group encodable and four-group decodable Differential Space-Time Codes (DSTCs) is proposed for use as Distributed DSTCs (DDSTCs) in the proposed set up. To the best of our knowledge, this is the first known low decoding complexity DDSTC scheme for cooperative wireless networks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We formulate a two-stage Iterative Wiener filtering (IWF) approach to speech enhancement, bettering the performance of constrained IWF, reported in literature. The codebook constrained IWF (CCIWF) has been shown to be effective in achieving convergence of IWF in the presence of both stationary and non-stationary noise. To this, we include a second stage of unconstrained IWF and show that the speech enhancement performance can be improved in terms of average segmental SNR (SSNR), Itakura-Saito (IS) distance and Linear Prediction Coefficients (LPC) parameter coincidence. We also explore the tradeoff between the number of CCIWF iterations and the second stage IWF iterations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Communication applications are usually delay restricted, especially for the instance of musicians playing over the Internet. This requires a one-way delay of maximum 25 msec and also a high audio quality is desired at feasible bit rates. The ultra low delay (ULD) audio coding structure is well suited to this application and we investigate further the application of multistage vector quantization (MSVQ) to reach a bit rate range below 64 Kb/s, in a scalable manner. Results at 32 Kb/s and 64 Kb/s show that the trained codebook MSVQ performs best, better than KLT normalization followed by a simulated Gaussian MSVQ or simulated Gaussian MSVQ alone. The results also show that there is only a weak dependence on the training data, and that we indeed converge to the perceptual quality of our previous ULD coder at 64 Kb/s.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

High-rate analysis of channel-optimized vector quantizationThis paper considers the high-rate performance of channel optimized source coding for noisy discrete symmetric channels with random index assignment. Specifically, with mean squared error (MSE) as the performance metric, an upper bound on the asymptotic (i.e., high-rate) distortion is derived by assuming a general structure on the codebook. This structure enables extension of the analysis of the channel optimized source quantizer to one with a singular point density: for channels with small errors, the point density that minimizes the upper bound is continuous, while as the error rate increases, the point density becomes singular. The extent of the singularity is also characterized. The accuracy of the expressions obtained are verified through Monte Carlo simulations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a new approach to spoken language modeling for language identification (LID) using the Lempel-Ziv-Welch (LZW) algorithm. The LZW technique is applicable to any kind of tokenization of the speech signal. Because of the efficiency of LZW algorithm to obtain variable length symbol strings in the training data, the LZW codebook captures the essentials of a language effectively. We develop two new deterministic measures for LID based on the LZW algorithm namely: (i) Compression ratio score (LZW-CR) and (ii) weighted discriminant score (LZW-WDS). To assess these measures, we consider error-free tokenization of speech as well as artificially induced noise in the tokenization. It is shown that for a 6 language LID task of OGI-TS database with clean tokenization, the new model (LZW-WDS) performs slightly better than the conventional bigram model. For noisy tokenization, which is the more realistic case, LZW-WDS significantly outperforms the bigram technique

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study considers linear filtering methods for minimising the end-to-end average distortion of a fixed-rate source quantisation system. For the source encoder, both scalar and vector quantisation are considered. The codebook index output by the encoder is sent over a noisy discrete memoryless channel whose statistics could be unknown at the transmitter. At the receiver, the code vector corresponding to the received index is passed through a linear receive filter, whose output is an estimate of the source instantiation. Under this setup, an approximate expression for the average weighted mean-square error (WMSE) between the source instantiation and the reconstructed vector at the receiver is derived using high-resolution quantisation theory. Also, a closed-form expression for the linear receive filter that minimises the approximate average WMSE is derived. The generality of framework developed is further demonstrated by theoretically analysing the performance of other adaptation techniques that can be employed when the channel statistics are available at the transmitter also, such as joint transmit-receive linear filtering and codebook scaling. Monte Carlo simulation results validate the theoretical expressions, and illustrate the improvement in the average distortion that can be obtained using linear filtering techniques.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Action recognition plays an important role in various applications, including smart homes and personal assistive robotics. In this paper, we propose an algorithm for recognizing human actions using motion capture action data. Motion capture data provides accurate three dimensional positions of joints which constitute the human skeleton. We model the movement of the skeletal joints temporally in order to classify the action. The skeleton in each frame of an action sequence is represented as a 129 dimensional vector, of which each component is a 31) angle made by each joint with a fixed point on the skeleton. Finally, the video is represented as a histogram over a codebook obtained from all action sequences. Along with this, the temporal variance of the skeletal joints is used as additional feature. The actions are classified using Meta-Cognitive Radial Basis Function Network (McRBFN) and its Projection Based Learning (PBL) algorithm. We achieve over 97% recognition accuracy on the widely used Berkeley Multimodal Human Action Database (MHAD).