984 resultados para Decoding Speech Prosody


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Segmental dynamic time warping (DTW) has been demonstrated to be a useful technique for finding acoustic similarity scores between segments of two speech utterances. Due to its high computational requirements, it had to be computed in an offline manner, limiting the applications of the technique. In this paper, we present results of parallelization of this task by distributing the workload in either a static or dynamic way on an 8-processor cluster and discuss the trade-offs among different distribution schemes. We show that online unsupervised pattern discovery using segmental DTW is plausible with as low as 8 processors. This brings the task within reach of today's general purpose multi-core servers. We also show results on a 32-processor system, and discuss factors affecting scalability of our methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present a new speech enhancement approach, that is based on exploiting the intra-frame dependency of discrete cosine transform (DCT) domain coefficients. It can be noted that the existing enhancement techniques treat the transformdomain coefficients independently. Instead of this traditional approach of independently processing the scalars, we split the DCT domain noisy speech vector into sub-vectors and each sub-vector is enhanced independently. Through this sub-vector based approach, the higher dimensional enhancement advantage, viz. non-linear dependency, is exploited. In the developed method, each clean speech sub-vector is modeled using a Gaussian mixture (GM) density. We show that the proposed Gaussian mixture model (GMM) based DCT domain method, using sub-vector processing approach, provides better performance than the conventional approach of enhancing the transform domain scalar components independently. Performance improvement over the recently proposed GMM based time domain approach is also shown.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Considering a general linear model of signal degradation, by modeling the probability density function (PDF) of the clean signal using a Gaussian mixture model (GMM) and additive noise by a Gaussian PDF, we derive the minimum mean square error (MMSE) estimator.The derived MMSE estimator is non-linear and the linear MMSE estimator is shown to be a special case. For speech signal corrupted by independent additive noise, by modeling the joint PDF of time-domain speech samples of a speech frame using a GMM, we propose a speech enhancement method based on the derived MMSE estimator. We also show that the same estimator can be used for transform-domain speech enhancement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We introduce a novel temporal feature of a signal, namely extrema-based signal track length (ESTL) for the problem of speech segmentation. We show that ESTL measure is sensitive to both amplitude and frequency of the signal. The short-time ESTL (ST_ESTL) shows a promising way to capture the significant segments of speech signal, where the segments correspond to acoustic units of speech having distinct temporal waveforms. We compare ESTL based segmentation with ML and STM methods and find that it is as good as spectral feature based segmentation, but with lesser computational complexity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper considers the high-rate performance of source coding for noisy discrete symmetric channels with random index assignment (IA). Accurate analytical models are developed to characterize the expected distortion performance of vector quantization (VQ) for a large class of distortion measures. It is shown that when the point density is continuous, the distortion can be approximated as the sum of the source quantization distortion and the channel-error induced distortion. Expressions are also derived for the continuous point density that minimizes the expected distortion. Next, for the case of mean squared error distortion, a more accurate analytical model for the distortion is derived by allowing the point density to have a singular component. The extent of the singularity is also characterized. These results provide analytical models for the expected distortion performance of both conventional VQ as well as for channel-optimized VQ. As a practical example, compression of the linear predictive coding parameters in the wideband speech spectrum is considered, with the log spectral distortion as performance metric. The theory is able to correctly predict the channel error rate that is permissible for operation at a particular level of distortion.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Precoding for multiple-input multiple-output (MIMO) antenna systems is considered with perfect channel knowledge available at both the transmitter and the receiver. For two transmit antennas and QAM constellations, a real-valued precoder which is approximately optimal (with respect to the minimum Euclidean distance between points in the received signal space) among real-valued precoders based on the singular value decomposition (SVD) of the channel is proposed. The proposed precoder is obtainable easily for arbitrary QAM constellations, unlike the known complex-valued optimal precoder by Collin et al. for two transmit antennas which is in existence for 4-QAM alone and is extremely hard to obtain for larger QAM constellations. The proposed precoding scheme is extended to higher number of transmit antennas on the lines of the E - d(min) precoder for 4-QAM by Vrigneau et al. which is an extension of the complex-valued optimal precoder for 4-QAM. The proposed precoder's ML-decoding complexity as a function of the constellation size M is only O(root M)while that of the E - d(min) precoder is O(M root M)(M = 4). Compared to the recently proposed X- and Y-precoders, the error performance of the proposed precoder is significantly better while being only marginally worse than that of the E - d(min) precoder for 4-QAM. It is argued that the proposed precoder provides full-diversity for QAM constellations and this is supported by simulation plots of the word error probability for 2 x 2, 4 x 4 and 8 x 8 systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Low complexity decoders called Partial Interference Cancellation (PIC) and PIC with Successive Interference Cancellation (PIC-SIC), which include the Zero Forcing (ZF) and ZF-SIC receivers as special cases, were given by Guo and Xia along with sufficient conditions for a Space-Time Block Code (STBC) to achieve full diversity with PIC/PIC-SIC decoding for point-to-point MIMO channels. In Part-I of this two part series of papers, we give new conditions for an STBC to achieve full diversity with PIC and PIC-SIC decoders, which are equivalent to Guo and Xia's conditions, but are much easier to check. We then show that PIC and PIC-SIC decoders are capable of achieving the full cooperative diversity available in wireless relay networks and give sufficient conditions for a Distributed Space-Time Block Code (DSTBC) to achieve full diversity with PIC and PIC-SIC decoders. In Part-II, we construct new low complexity full-diversity PIC/PIC-SIC decodable STBCs and DSTBCs that achieve higher rates than the known full-diversity low complexity ML decodable STBCs and DSTBCs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this second part of a two part series of papers, we construct a new class of Space-Time Block Codes (STBCs) for point-to-point MIMO channel and Distributed STBCs (DSTBCs) for the amplify-and-forward relay channel that give full-diversity with Partial Interference Cancellation (PIC) and PIC with Successive Interference Cancellation (PIC-SIC) decoders. The proposed class of STBCs include most of the known full-diversity low complexity PIC/PIC-SIC decodable STBCs as special cases. We also show that a number of known full-diversity PIC/PIC-SIC decodable STBCs that were constructed for the point-topoint MIMO channel can be used as full-diversity PIC/PIC-SIC decodable DSTBCs in relay networks. For the same decoding complexity, the proposed STBCs and DSTBCs achieve higher rates than the known low decoding complexity codes. Simulation results show that the new codes have a better bit error rate performance than the low ML decoding complexity codes available in the literature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Generalized Distributive Law (GDL) is a message passing algorithm which can efficiently solve a certain class of computational problems, and includes as special cases the Viterbi's algorithm, the BCJR algorithm, the Fast-Fourier Transform, Turbo and LDPC decoding algorithms. In this paper GDL based maximum-likelihood (ML) decoding of Space-Time Block Codes (STBCs) is introduced and a sufficient condition for an STBC to admit low GDL decoding complexity is given. Fast-decoding and multigroup decoding are the two algorithms used in the literature to ML decode STBCs with low complexity. An algorithm which exploits the advantages of both these two is called Conditional ML (CML) decoding. It is shown in this paper that the GDL decoding complexity of any STBC is upper bounded by its CML decoding complexity, and that there exist codes for which the GDL complexity is strictly less than the CML complexity. Explicit examples of two such families of STBCs is given in this paper. Thus the CML is in general suboptimal in reducing the ML decoding complexity of a code, and one should design codes with low GDL complexity rather than low CML complexity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A modification of the Viterbi decoding algorithm is suggested for faster convergence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we give a new framework for constructing low ML decoding complexity space-time block codes (STBCs) using codes over the Klein group K. Almost all known low ML decoding complexity STBCs can be obtained via this approach. New full- diversity STBCs with low ML decoding complexity and cubic shaping property are constructed, via codes over K, for number of transmit antennas N = 2(m), m >= 1, and rates R > 1 complex symbols per channel use. When R = N, the new STBCs are information- lossless as well. The new class of STBCs have the least knownML decoding complexity among all the codes available in the literature for a large set of (N, R) pairs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The design and operation of the minimum cost classifier, where the total cost is the sum of the measurement cost and the classification cost, is computationally complex. Noting the difficulties associated with this approach, decision tree design directly from a set of labelled samples is proposed in this paper. The feature space is first partitioned to transform the problem to one of discrete features. The resulting problem is solved by a dynamic programming algorithm over an explicitly ordered state space of all outcomes of all feature subsets. The solution procedure is very general and is applicable to any minimum cost pattern classification problem in which each feature has a finite number of outcomes. These techniques are applied to (i) voiced, unvoiced, and silence classification of speech, and (ii) spoken vowel recognition. The resulting decision trees are operationally very efficient and yield attractive classification accuracies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we employ message passing algorithms over graphical models to jointly detect and decode symbols transmitted over large multiple-input multiple-output (MIMO) channels with low density parity check (LDPC) coded bits. We adopt a factor graph based technique to integrate the detection and decoding operations. A Gaussian approximation of spatial interference is used for detection. This serves as a low complexity joint detection/decoding approach for large dimensional MIMO systems coded with LDPC codes of large block lengths. This joint processing achieves significantly better performance than the individual detection and decoding scheme.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Low density parity-check (LDPC) codes are a class of linear block codes that are decoded by running belief propagation (BP) algorithm or log-likelihood ratio belief propagation (LLR-BP) over the factor graph of the code. One of the disadvantages of LDPC codes is the onset of an error floor at high values of signal to noise ratio caused by trapping sets. In this paper, we propose a two stage decoder to deal with different types of trapping sets. Oscillating trapping sets are taken care by the first stage of the decoder and the elementary trapping sets are handled by the second stage of the decoder. Simulation results on the regular PEG (504,252,3,6) code and the irregular PEG (1024,518,15,8) code shows that the proposed two stage decoder performs significantly better than the standard decoder.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication. In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on 1,138 work vocabulary RM1 task and 6,224 word vocabulary TIMIT task using Sphinx 3.7 system show that, for a typical case the matrix multiplication based approach leads to overall speedup of 46 % on RM1 task and 115 % for TIMIT task. Our low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 % for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 % for RM1 and for no increase in WER for TIMIT. We also express pairwise Euclidean distance computation phase in Dynamic Time Warping (DTW) in terms of matrix multiplication leading to saving of approximately of computational operations. In our experiments using efficient implementation of matrix multiplication, this leads to a speedup of 5.6 in computing the pairwise Euclidean distances and overall speedup up to 3.25 for DTW.