984 resultados para Decoding Speech Prosody


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently, Guo and Xia introduced low complexity decoders called Partial Interference Cancellation (PIC) and PIC with Successive Interference Cancellation (PIC-SIC), which include the Zero Forcing (ZF) and ZF-SIC receivers as special cases, for point-to-point MIMO channels. In this paper, we show that PIC and PIC-SIC decoders are capable of achieving the full cooperative diversity available in wireless relay networks. We give sufficient conditions for a Distributed Space-Time Block Code (DSTBC) to achieve full diversity with PIC and PIC-SIC decoders and construct a new class of DSTBCs with low complexity full-diversity PIC-SIC decoding using complex orthogonal designs. The new class of codes includes a number of known full-diversity PIC/PIC-SIC decodable Space-Time Block Codes (STBCs) constructed for point-to-point channels as special cases. The proposed DSTBCs achieve higher rates (in complex symbols per channel use) than the multigroup ML decodable DSTBCs available in the literature. Simulation results show that the proposed codes have better bit error rate performance than the best known low complexity, full-diversity DSTBCs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Decoding of linear space-time block codes (STBCs) with sphere-decoding (SD) is well known. A fast-version of the SD known as fast sphere decoding (FSD) has been recently studied by Biglieri, Hong and Viterbo. Viewing a linear STBC as a vector space spanned by its defining weight matrices over the real number field, we define a quadratic form (QF), called the Hurwitz-Radon QF (HRQF), on this vector space and give a QF interpretation of the FSD complexity of a linear STBC. It is shown that the FSD complexity is only a function of the weight matrices defining the code and their ordering, and not of the channel realization (even though the equivalent channel when SD is used depends on the channel realization) or the number of receive antennas. It is also shown that the FSD complexity is completely captured into a single matrix obtained from the HRQF. Moreover, for a given set of weight matrices, an algorithm to obtain a best ordering of them leading to the least FSD complexity is presented. The well known classes of low FSD complexity codes (multi-group decodable codes, fast decodable codes and fast group decodable codes) are presented in the framework of HRQF.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the speech production mechanism and the asso- ciated linear source-filter model. For voiced speech sounds in particular, the source/glottal excitation is modeled as a stream of impulses and the filter as a cascade of second-order resonators. We show that the process of sampling speech signals can be modeled as filtering a stream of Dirac impulses (a model for the excitation) with a kernel function (the vocal tract response),and then sampling uniformly. We show that the problem of esti- mating the excitation is equivalent to the problem of recovering a stream of Dirac impulses from samples of a filtered version. We present associated algorithms based on the annihilating filter and also make a comparison with the classical linear prediction technique, which is well known in speech analysis. Results on synthesized as well as natural speech data are presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the problem of speech enhancement in real-world noisy scenarios. We propose to solve the problem in two stages, the first comprising a generalized spectral subtraction technique, followed by a sequence of perceptually-motivated post-processing algorithms. The role of the post-processing algorithms is to compensate for the effects of noise as well as to suppress any artifacts created by the first-stage processing. The key post-processing mechanisms are aimed at suppressing musical noise and to enhance the formant structure of voiced speech as well as to denoise the linear-prediction residual. The parameter values in the techniques are fixed optimally by experimentally evaluating the enhancement performance as a function of the parameters. We used the Carnegie-Mellon university Arctic database for our experiments. We considered three real-world noise types: fan noise, car noise, and motorbike noise. The enhancement performance was evaluated by conducting listening experiments on 12 subjects. The listeners reported a clear improvement (MOS improvement of 0.5 on an average) over the noisy signal in the perceived quality (increase in the mean-opinion score (MOS)) for positive signal-to-noise-ratios (SNRs). For negative SNRs, however, the improvement was found to be marginal.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose a postprocessing technique for a spectrogram diffusion based harmonic/percussion decom- position algorithm. The proposed technique removes har- monic instrument leakages in the percussion enhanced out- puts of the baseline algorithm. The technique uses median filtering and an adaptive detection of percussive segments in subbands followed by piecewise signal reconstruction using envelope properties to ensure that percussion is enhanced while harmonic leakages are suppressed. A new binary mask is created for the percussion signal which upon applying on the original signal improves harmonic versus percussion separation. We compare our algorithm with two recent techniques and show that on a database of polyphonic Indian music, the postprocessing algorithm improves the harmonic versus percussion decomposition significantly.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Video decoders used in emerging applications need to be flexible to handle a large variety of video formats and deliver scalable performance to handle wide variations in workloads. In this paper we propose a unified software and hardware architecture for video decoding to achieve scalable performance with flexibility. The light weight processor tiles and the reconfigurable hardware tiles in our architecture enable software and hardware implementations to co-exist, while a programmable interconnect enables dynamic interconnection of the tiles. Our process network oriented compilation flow achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi-processor, semi hardware and full hardware implementations of a video decoder. An application quality of service aware scheduler monitors and controls the operation of the entire system. We prove the concept through a prototype of the architecture on an off-the-shelf FPGA. The FPGA prototype shows a scaling in performance from QCIF to 1080p resolutions in four discrete steps. We also demonstrate that the reconfiguration time is short enough to allow migration from one configuration to the other without any frame loss.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We analyze the spectral zero-crossing rate (SZCR) properties of transient signals and show that SZCR contains accurate localization information about the transient. For a train of pulses containing transient events, the SZCR computed on a sliding window basis is useful in locating the impulse locations accurately. We present the properties of SZCR on standard stylized signal models and then show how it may be used to estimate the epochs in speech signals. We also present comparisons with some state-of-the-art techniques that are based on the group-delay function. Experiments on real speech show that the proposed SZCR technique is better than other group-delay-based epoch detectors. In the presence of noise, a comparison with the zero-frequency filtering technique (ZFF) and Dynamic programming projected Phase-Slope Algorithm (DYPSA) showed that performance of the SZCR technique is better than DYPSA and inferior to that of ZFF. For highpass-filtered speech, where ZFF performance suffers drastically, the identification rates of SZCR are better than those of DYPSA.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The goal of speech enhancement algorithms is to provide an estimate of clean speech starting from noisy observations. The often-employed cost function is the mean square error (MSE). However, the MSE can never be computed in practice. Therefore, it becomes necessary to find practical alternatives to the MSE. In image denoising problems, the cost function (also referred to as risk) is often replaced by an unbiased estimator. Motivated by this approach, we reformulate the problem of speech enhancement from the perspective of risk minimization. Some recent contributions in risk estimation have employed Stein's unbiased risk estimator (SURE) together with a parametric denoising function, which is a linear expansion of threshold/bases (LET). We show that the first-order case of SURE-LET results in a Wiener-filter type solution if the denoising function is made frequency-dependent. We also provide enhancement results obtained with both techniques and characterize the improvement by means of local as well as global SNR calculations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the problem of speech enhancement using a risk- estimation approach. In particular, we propose the use the Stein’s unbiased risk estimator (SURE) for solving the problem. The need for a suitable finite-sample risk estimator arises because the actual risks invariably depend on the unknown ground truth. We consider the popular mean-squared error (MSE) criterion first, and then compare it against the perceptually-motivated Itakura-Saito (IS) distortion, by deriving unbiased estimators of the corresponding risks. We use a generalized SURE (GSURE) development, recently proposed by Eldar for MSE. We consider dependent observation models from the exponential family with an additive noise model,and derive an unbiased estimator for the risk corresponding to the IS distortion, which is non-quadratic. This serves to address the speech enhancement problem in a more general setting. Experimental results illustrate that the IS metric is efficient in suppressing musical noise, which affects the MSE-enhanced speech. However, in terms of global signal-to-noise ratio (SNR), the minimum MSE solution gives better results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a new sub-band approach to estimate the glottal activity. The method is based on the spectral harmonicity and the sub-band temporal properties of voiced speech. We propose a method to represent glottal excitation signal using sub-band temporal envelope. Instants of maximum glottal excitation or Glottal Closure Instants (GCI) are extracted from the estimated glottal excitation pattern and the result is compared with a standard GCI computation method, DYPSA [1]. The performance of the algorithm is also compared for the noisy signal and it is shown that the proposed method is less variant to GCI estimation under noisy conditions compared to DYPSA. The algorithm is evaluated on the CMU-ARCTIC database.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of designing good space-time block codes (STBCs) with low maximum-likelihood (ML) decoding complexity has gathered much attention in the literature. All the known low ML decoding complexity techniques utilize the same approach of exploiting either the multigroup decodable or the fast-decodable (conditionally multigroup decodable) structure of a code. We refer to this well-known technique of decoding STBCs as conditional ML (CML) decoding. In this paper, we introduce a new framework to construct ML decoders for STBCs based on the generalized distributive law (GDL) and the factor-graph-based sum-product algorithm. We say that an STBC is fast GDL decodable if the order of GDL decoding complexity of the code, with respect to the constellation size, is strictly less than M-lambda, where lambda is the number of independent symbols in the STBC. We give sufficient conditions for an STBC to admit fast GDL decoding, and show that both multigroup and conditionally multigroup decodable codes are fast GDL decodable. For any STBC, whether fast GDL decodable or not, we show that the GDL decoding complexity is strictly less than the CML decoding complexity. For instance, for any STBC obtained from cyclic division algebras which is not multigroup or conditionally multigroup decodable, the GDL decoder provides about 12 times reduction in complexity compared to the CML decoder. Similarly, for the Golden code, which is conditionally multigroup decodable, the GDL decoder is only half as complex as the CML decoder.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A joint analysis-synthesis framework is developed for the compressive sensing (CS) recovery of speech signals. The signal is assumed to be sparse in the residual domain with the linear prediction filter used as the sparse transformation. Importantly this transform is not known apriori, since estimating the predictor filter requires the knowledge of the signal. Two prediction filters, one comb filter for pitch and another all pole formant filter are needed to induce maximum sparsity. An iterative method is proposed for the estimation of both the prediction filters and the signal itself. Formant prediction filter is used as the synthesis transform, while the pitch filter is used to model the periodicity in the residual excitation signal, in the analysis mode. Significant improvement in the LLR measure is seen over the previously reported formant filter estimation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Latent variable methods, such as PLCA (Probabilistic Latent Component Analysis) have been successfully used for analysis of non-negative signal representations. In this paper, we formulate PLCS (Probabilistic Latent Component Segmentation), which models each time frame of a spectrogram as a spectral distribution. Given the signal spectrogram, the segmentation boundaries are estimated using a maximum-likelihood approach. For an efficient solution, the algorithm imposes a hard constraint that each segment is modelled by a single latent component. The hard constraint facilitates the solution of ML boundary estimation using dynamic programming. The PLCS framework does not impose a parametric assumption unlike earlier ML segmentation techniques. PLCS can be naturally extended to model coarticulation between successive phones. Experiments on the TIMIT corpus show that the proposed technique is promising compared to most state of the art speech segmentation algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A space-time block code (STBC) is said to be multigroup decodable if the information symbols encoded by it can be partitioned into two or more groups such that each group of symbols can be maximum-likelihood (ML) decoded independently of the other symbol groups. In this paper, we show that the upper triangular matrix encountered during the sphere decoding of a linear dispersion STBC can be rank-deficient even when the rate of the code is less than the minimum of the number of transmit and receive antennas. We then show that all known families of high-rate (rate greater than 1) multigroup decodable codes have rank-deficient matrix even when the rate is less than the number of transmit and receive antennas, and this rank-deficiency problem arises only in asymmetric MIMO systems when the number of receive antennas is strictly less than the number of transmit antennas. Unlike the codes with full-rank matrix, the complexity of the sphere decoding-based ML decoder for STBCs with rank-deficient matrix is polynomial in the constellation size, and hence is high. We derive the ML sphere decoding complexity of most of the known high-rate multigroup decodable codes, and show that for each code, the complexity is a decreasing function of the number of receive antennas.