988 resultados para Speech Rate


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper describes a modular, unit selection based TTS framework, which can be used as a research bed for developing TTS in any new language, as well as studying the effect of changing any parameter during synthesis. Using this framework, TTS has been developed for Tamil. Synthesis database consists of 1027 phonetically rich prerecorded sentences. This framework has already been tested for Kannada. Our TTS synthesizes intelligible and acceptably natural speech, as supported by high mean opinion scores. The framework is further optimized to suit embedded applications like mobiles and PDAs. We compressed the synthesis speech database with standard speech compression algorithms used in commercial GSM phones and evaluated the quality of the resultant synthesized sentences. Even with a highly compressed database, the synthesized output is perceptually close to that with uncompressed database. Through experiments, we explored the ambiguities in human perception when listening to Tamil phones and syllables uttered in isolation,thus proposing to exploit the misperception to substitute for missing phone contexts in the database. Listening experiments have been conducted on sentences synthesized by deliberately replacing phones with their confused ones.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a low-ML-decoding-complexity, full-rate, full-diversity space-time block code (STBC) for a 2 transmit antenna, 2 receive antenna multiple-input multipleoutput (MIMO) system, with coding gain equal to that of the best and well known Golden code for any QAM constellation.Recently, two codes have been proposed (by Paredes, Gershman and Alkhansari and by Sezginer and Sari), which enjoy a lower decoding complexity relative to the Golden code, but have lesser coding gain. The 2 × 2 STBC presented in this paper has lesser decoding complexity for non-square QAM constellations,compared with that of the Golden code, while having the same decoding complexity for square QAM constellations. Compared with the Paredes-Gershman-Alkhansari and Sezginer-Sari codes, the proposed code has the same decoding complexity for nonrectangular QAM constellations. Simulation results, which compare the codeword error rate (CER) performance, are presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traditional subspace based speech enhancement (SSE)methods use linear minimum mean square error (LMMSE) estimation that is optimal if the Karhunen Loeve transform (KLT) coefficients of speech and noise are Gaussian distributed. In this paper, we investigate the use of Gaussian mixture (GM) density for modeling the non-Gaussian statistics of the clean speech KLT coefficients. Using Gaussian mixture model (GMM), the optimum minimum mean square error (MMSE) estimator is found to be nonlinear and the traditional LMMSE estimator is shown to be a special case. Experimental results show that the proposed method provides better enhancement performance than the traditional subspace based methods.Index Terms: Subspace based speech enhancement, Gaussian mixture density, MMSE estimation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We formulate a two-stage Iterative Wiener filtering (IWF) approach to speech enhancement, bettering the performance of constrained IWF, reported in literature. The codebook constrained IWF (CCIWF) has been shown to be effective in achieving convergence of IWF in the presence of both stationary and non-stationary noise. To this, we include a second stage of unconstrained IWF and show that the speech enhancement performance can be improved in terms of average segmental SNR (SSNR), Itakura-Saito (IS) distance and Linear Prediction Coefficients (LPC) parameter coincidence. We also explore the tradeoff between the number of CCIWF iterations and the second stage IWF iterations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Effective feature extraction for robust speech recognition is a widely addressed topic and currently there is much effort to invoke non-stationary signal models instead of quasi-stationary signal models leading to standard features such as LPC or MFCC. Joint amplitude modulation and frequency modulation (AM-FM) is a classical non-parametric approach to non-stationary signal modeling and recently new feature sets for automatic speech recognition (ASR) have been derived based on a multi-band AM-FM representation of the signal. We consider several of these representations and compare their performances for robust speech recognition in noise, using the AURORA-2 database. We show that FEPSTRUM representation proposed is more effective than others. We also propose an improvement to FEPSTRUM based on the Teager energy operator (TEO) and show that it can selectively outperform even FEPSTRUM

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Segmental dynamic time warping (DTW) has been demonstrated to be a useful technique for finding acoustic similarity scores between segments of two speech utterances. Due to its high computational requirements, it had to be computed in an offline manner, limiting the applications of the technique. In this paper, we present results of parallelization of this task by distributing the workload in either a static or dynamic way on an 8-processor cluster and discuss the trade-offs among different distribution schemes. We show that online unsupervised pattern discovery using segmental DTW is plausible with as low as 8 processors. This brings the task within reach of today's general purpose multi-core servers. We also show results on a 32-processor system, and discuss factors affecting scalability of our methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present a new speech enhancement approach, that is based on exploiting the intra-frame dependency of discrete cosine transform (DCT) domain coefficients. It can be noted that the existing enhancement techniques treat the transformdomain coefficients independently. Instead of this traditional approach of independently processing the scalars, we split the DCT domain noisy speech vector into sub-vectors and each sub-vector is enhanced independently. Through this sub-vector based approach, the higher dimensional enhancement advantage, viz. non-linear dependency, is exploited. In the developed method, each clean speech sub-vector is modeled using a Gaussian mixture (GM) density. We show that the proposed Gaussian mixture model (GMM) based DCT domain method, using sub-vector processing approach, provides better performance than the conventional approach of enhancing the transform domain scalar components independently. Performance improvement over the recently proposed GMM based time domain approach is also shown.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We develop a Gaussian mixture model (GMM) based vector quantization (VQ) method for coding wideband speech line spectrum frequency (LSF) parameters at low complexity. The PDF of LSF source vector is modeled using the Gaussian mixture (GM) density with higher number of uncorrelated Gaussian mixtures and an optimum scalar quantizer (SQ) is designed for each Gaussian mixture. The reduction of quantization complexity is achieved using the relevant subset of available optimum SQs. For an input vector, the subset of quantizers is chosen using nearest neighbor criteria. The developed method is compared with the recent VQ methods and shown to provide high quality rate-distortion (R/D) performance at lower complexity. In addition, the developed method also provides the advantages of bitrate scalability and rate-independent complexity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Considering a general linear model of signal degradation, by modeling the probability density function (PDF) of the clean signal using a Gaussian mixture model (GMM) and additive noise by a Gaussian PDF, we derive the minimum mean square error (MMSE) estimator.The derived MMSE estimator is non-linear and the linear MMSE estimator is shown to be a special case. For speech signal corrupted by independent additive noise, by modeling the joint PDF of time-domain speech samples of a speech frame using a GMM, we propose a speech enhancement method based on the derived MMSE estimator. We also show that the same estimator can be used for transform-domain speech enhancement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Size and strain rate effects are among several factors which play an important role in determining the response of nanostructures, such as their deformations, to the mechanical loadings. The mechanical deformations in nanostructure systems at finite temperatures are intrinsically dynamic processes. Most of the recent works in this context have been focused on nanowires [1, 2], but very little attention has been paid to such low dimensional nanostructures as quantum dots (QDs). In this contribution, molecular dynamics (MD) simulations with an embedded atom potential method(EAM) are carried out to analyse the size and strain rate effects in the silicon (Si) QDs, as an example. We consider various geometries of QDs such as spherical, cylindrical and cubic. We choose Si QDs as an example due to their major applications in solar cells and biosensing. The analysis has also been focused on the variation in the deformation mechanisms with the size and strain rate for Si QD embedded in a matrix of SiO2 [3] (other cases include SiN and SiC matrices).It is observed that the mechanical properties are the functions of the QD size, shape and strain rate as it is in the case for nanowires [2]. We also present the comparative study resulted from the application of different EAM potentials in particular, the Stillinger-Weber (SW) potential, the Tersoff potentials and the environment-dependent interatomic potential (EDIP) [1]. Finally, based on the stabilized structural properties we compute electronic bandstructures of our nanostructures using an envelope function approach and its finite element implementation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this article we study the problem of joint congestion control, routing and MAC layer scheduling in multi-hop wireless mesh network, where the nodes in the network are subjected to maximum energy expenditure rates. We model link contention in the wireless network using the contention graph and we model energy expenditure rate constraint of nodes using the energy expenditure rate matrix. We formulate the problem as an aggregate utility maximization problem and apply duality theory in order to decompose the problem into two sub-problems namely, network layer routing and congestion control problem and MAC layer scheduling problem. The source adjusts its rate based on the cost of the least cost path to the destination where the cost of the path includes not only the prices of the links in it but also the prices associated with the nodes on the path. The MAC layer scheduling of the links is carried out based on the prices of the links. We study the e�ects of energy expenditure rate constraints of the nodes on the optimal throughput of the network.