325 resultados para free speech
Resumo:
In this paper, we present a new speech enhancement approach, that is based on exploiting the intra-frame dependency of discrete cosine transform (DCT) domain coefficients. It can be noted that the existing enhancement techniques treat the transformdomain coefficients independently. Instead of this traditional approach of independently processing the scalars, we split the DCT domain noisy speech vector into sub-vectors and each sub-vector is enhanced independently. Through this sub-vector based approach, the higher dimensional enhancement advantage, viz. non-linear dependency, is exploited. In the developed method, each clean speech sub-vector is modeled using a Gaussian mixture (GM) density. We show that the proposed Gaussian mixture model (GMM) based DCT domain method, using sub-vector processing approach, provides better performance than the conventional approach of enhancing the transform domain scalar components independently. Performance improvement over the recently proposed GMM based time domain approach is also shown.
Resumo:
Considering a general linear model of signal degradation, by modeling the probability density function (PDF) of the clean signal using a Gaussian mixture model (GMM) and additive noise by a Gaussian PDF, we derive the minimum mean square error (MMSE) estimator.The derived MMSE estimator is non-linear and the linear MMSE estimator is shown to be a special case. For speech signal corrupted by independent additive noise, by modeling the joint PDF of time-domain speech samples of a speech frame using a GMM, we propose a speech enhancement method based on the derived MMSE estimator. We also show that the same estimator can be used for transform-domain speech enhancement.
Resumo:
The 4ÃÂ4 discrete cosine transform is one of the most important building blocks for the emerging video coding standard, viz. H.264. The conventional implementation does some approximation to the transform matrix elements to facilitate integer arithmetic, for which hardware is suitably prepared. Though the transform coding does not involve any multiplications, quantization process requires sixteen 16-bit multiplications. The algorithm used here eliminates the process of approximation in transform coding and multiplication in the quantization process, by usage of algebraic integer coding. We propose an area-efficient implementation of the transform and quantization blocks based on the algebraic integer coding. The designs were synthesized with 90 nm TSMC CMOS technology and were also implemented on a Xilinx FPGA. The gate counts and throughput achievable in this case are 7000 and 125 Msamples/sec.
Resumo:
We introduce a novel temporal feature of a signal, namely extrema-based signal track length (ESTL) for the problem of speech segmentation. We show that ESTL measure is sensitive to both amplitude and frequency of the signal. The short-time ESTL (ST_ESTL) shows a promising way to capture the significant segments of speech signal, where the segments correspond to acoustic units of speech having distinct temporal waveforms. We compare ESTL based segmentation with ML and STM methods and find that it is as good as spectral feature based segmentation, but with lesser computational complexity.
Resumo:
We identify a class of timed automata, which we call counter-free input-determined automata, which characterize the class of timed languages definable by several timed temporal logics in the literature, including MTL. We make use of this characterization to show that MTL+Past satisfies an “ultimate stability” property with respect to periodic sequences of timed words. Our results hold for both the pointwise and continuous semantics. Along the way we generalize the result of McNaughton-Papert to show a counter-free automata characterization of FO-definable finitely varying functions.
Resumo:
A facile metal-free route of oxidative amination of benzoxazole by activation of C-H bonds with secondary or primary amines in the presence of catalytic iodine in aqueous tert-butyl hydroperoxide proceeds smoothly at ambient temperature under neat reaction condition to furnish the high yield of the aminated product. This user-friendly method to form C-N bonds produces tertiary butanol and water as the byproduct, which are environmentally benign. The application of the methodology is demonsrated by synthesizing therapeutically active benzoxazoles.
Resumo:
We report the shape evolution of free gold agglomerates with different morphologies that transform to ellipsoidal and then to spherical shapes during the heating cycle. The shape transformation is associated with a structural transition from polycrystalline to single crystalline. The structural transition temperature is shown to be dependent on the final size of the particles and not on the initial morphologies of the agglomerates. It is also shown that the transition occurs well below the melting temperature which is in contrast with the melt-freeze process reported in the literature.
Resumo:
We present a new approach to spoken language modeling for language identification (LID) using the Lempel-Ziv-Welch (LZW) algorithm. The LZW technique is applicable to any kind of tokenization of the speech signal. Because of the efficiency of LZW algorithm to obtain variable length symbol strings in the training data, the LZW codebook captures the essentials of a language effectively. We develop two new deterministic measures for LID based on the LZW algorithm namely: (i) Compression ratio score (LZW-CR) and (ii) weighted discriminant score (LZW-WDS). To assess these measures, we consider error-free tokenization of speech as well as artificially induced noise in the tokenization. It is shown that for a 6 language LID task of OGI-TS database with clean tokenization, the new model (LZW-WDS) performs slightly better than the conventional bigram model. For noisy tokenization, which is the more realistic case, LZW-WDS significantly outperforms the bigram technique