69 resultados para Speech synthesis Data processing

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new method based on unit continuity metric (UCM) is proposed for optimal unit selection in text-to-speech (TTS) synthesis. UCM employs two features, namely, pitch continuity metric and spectral continuity metric. The methods have been implemented and tested on our test bed called MILE-TTS and it is available as web demo. After verification by a self selection test, the algorithms are evaluated on 8 paragraphs each for Kannada and Tamil by native users of the languages. Mean-opinion-score (MOS) shows that naturalness and comprehension are better with UCM based algorithm than the non-UCM based ones. The naturalness of the TTS output is further enhanced by a new rule based algorithm for pause prediction for Tamil language. The pauses between the words are predicted based on parts-of-speech information obtained from the input text.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An overview of the synthesis of materials under microwave irradiation has been presented based on the work performed recently. A variety of reactions such as direct combination, carbothermal reduction, carbidation and nitridation have been described. Examples of microwave preparation of glasses are also presented. Great advantages of fast, clean and reduced reaction temperature of microwave methods are emphasized. The example of ZrO2-CeO2 ceramics has been used show the extraordinarily fast and effective sintering which occurs in microwave irradiation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper describes a modular, unit selection based TTS framework, which can be used as a research bed for developing TTS in any new language, as well as studying the effect of changing any parameter during synthesis. Using this framework, TTS has been developed for Tamil. Synthesis database consists of 1027 phonetically rich prerecorded sentences. This framework has already been tested for Kannada. Our TTS synthesizes intelligible and acceptably natural speech, as supported by high mean opinion scores. The framework is further optimized to suit embedded applications like mobiles and PDAs. We compressed the synthesis speech database with standard speech compression algorithms used in commercial GSM phones and evaluated the quality of the resultant synthesized sentences. Even with a highly compressed database, the synthesized output is perceptually close to that with uncompressed database. Through experiments, we explored the ambiguities in human perception when listening to Tamil phones and syllables uttered in isolation,thus proposing to exploit the misperception to substitute for missing phone contexts in the database. Listening experiments have been conducted on sentences synthesized by deliberately replacing phones with their confused ones.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Oversmoothing of speech parameter trajectories is one of the causes for quality degradation of HMM-based speech synthesis. Various methods have been proposed to overcome this effect, the most recent ones being global variance (GV) and modulation-spectrum-based post-filter (MSPF). However, there is still a significant quality gap between natural and synthesized speech. In this paper, we propose a two-fold post-filtering technique to alleviate to a certain extent the oversmoothing of spectral and excitation parameter trajectories of HMM-based speech synthesis. For the spectral parameters, we propose a sparse coding-based post-filter to match the trajectories of synthetic speech to that of natural speech, and for the excitation trajectory, we introduce a perceptually motivated post-filter. Experimental evaluations show quality improvement compared with existing methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using analysis-by-synthesis (AbS) approach, we develop a soft decision based switched vector quantization (VQ) method for high quality and low complexity coding of wideband speech line spectral frequency (LSF) parameters. For each switching region, a low complexity transform domain split VQ (TrSVQ) is designed. The overall rate-distortion (R/D) performance optimality of new switched quantizer is addressed in the Gaussian mixture model (GMM) based parametric framework. In the AbS approach, the reduction of quantization complexity is achieved through the use of nearest neighbor (NN) TrSVQs and splitting the transform domain vector into higher number of subvectors. Compared to the current LSF quantization methods, the new method is shown to provide competitive or better trade-off between R/D performance and complexity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes an algorithm for joint data detection and tracking of the dominant singular mode of a time varying channel at the transmitter and receiver of a time division duplex multiple input multiple output beamforming system. The method proposed is a modified expectation maximization algorithm which utilizes an initial estimate to track the dominant modes of the channel at the transmitter and the receiver blindly; and simultaneously detects the un known data. Furthermore, the estimates are constrained to be within a confidence interval of the previous estimate in order to improve the tracking performance and mitigate the effect of error propagation. Monte-Carlo simulation results of the symbol error rate and the mean square inner product between the estimated and the true singular vector are plotted to show the performance benefits offered by the proposed method compared to existing techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper considers the problem of weak signal detection in the presence of navigation data bits for Global Navigation Satellite System (GNSS) receivers. Typically, a set of partial coherent integration outputs are non-coherently accumulated to combat the effects of model uncertainties such as the presence of navigation data-bits and/or frequency uncertainty, resulting in a sub-optimal test statistic. In this work, the test-statistic for weak signal detection is derived in the presence of navigation data-bits from the likelihood ratio. It is highlighted that averaging the likelihood ratio based test-statistic over the prior distributions of the unknown data bits and the carrier phase uncertainty leads to the conventional Post Detection Integration (PDI) technique for detection. To improve the performance in the presence of model uncertainties, a novel cyclostationarity based sub-optimal PDI technique is proposed. The test statistic is analytically characterized, and shown to be robust to the presence of navigation data-bits, frequency, phase and noise uncertainties. Monte Carlo simulation results illustrate the validity of the theoretical results and the superior performance offered by the proposed detector in the presence of model uncertainties.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The design and development of a Bottom Pressure Recorder for a Tsunami Early Warning System is described here. The special requirements that it should satisfy for the specific application of deployment at ocean bed and pressure monitoring of the water column above are dealt with. A high-resolution data digitization and low circuit power consumption are typical ones. The implementation details of the data sensing and acquisition part to meet these are also brought out. The data processing part typically encompasses a Tsunami detection algorithm that should detect an event of significance in the background of a variety of periodic and aperiodic noise signals. Such an algorithm and its simulation are presented. Further, the results of sea trials carried out on the system off the Chennai coast are presented. The high quality and fidelity of the data prove that the system design is robust despite its low cost and with suitable augmentations, is ready for a full-fledged deployment at ocean bed. (C) 2013 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Narrowband spectrograms of voiced speech can be modeled as an outcome of two-dimensional (2-D) modulation process. In this paper, we develop a demodulation algorithm to estimate the 2-D amplitude modulation (AM) and carrier of a given spectrogram patch. The demodulation algorithm is based on the Riesz transform, which is a unitary, shift-invariant operator and is obtained as a 2-D extension of the well known 1-D Hilbert transform operator. Existing methods for spectrogram demodulation rely on extension of sinusoidal demodulation method from the communications literature and require precise estimate of the 2-D carrier. On the other hand, the proposed method based on Riesz transform does not require a carrier estimate. The proposed method and the sinusoidal demodulation scheme are tested on real speech data. Experimental results show that the demodulated AM and carrier from Riesz demodulation represent the spectrogram patch more accurately compared with those obtained using the sinusoidal demodulation. The signal-to-reconstruction error ratio was found to be about 2 to 6 dB higher in case of the proposed demodulation approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Electromagnetic Articulography (EMA) technique is used to record the kinematics of different articulators while one speaks. EMA data often contains missing segments due to sensor failure. In this work, we propose a maximum a-posteriori (MAP) estimation with continuity constraint to recover the missing samples in the articulatory trajectories recorded using EMA. In this approach, we combine the benefits of statistical MAP estimation as well as the temporal continuity of the articulatory trajectories. Experiments on articulatory corpus using different missing segment durations show that the proposed continuity constraint results in a 30% reduction in average root mean squared error in estimation over statistical estimation of missing segments without any continuity constraint.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a two-dimensional (2-D) multicomponent amplitude-modulation, frequency-modulation (AM-FM) model for a spectrogram patch corresponding to voiced speech, and develop a new demodulation algorithm to effectively separate the AM, which is related to the vocal tract response, and the carrier, which is related to the excitation. The demodulation algorithm is based on the Riesz transform and is developed along the lines of Hilbert-transform-based demodulation for 1-D AM-FM signals. We compare the performance of the Riesz transform technique with that of the sinusoidal demodulation technique on real speech data. Experimental results show that the Riesz-transform-based demodulation technique represents spectrogram patches accurately. The spectrograms reconstructed from the demodulated AM and carrier are inverted and the corresponding speech signal is synthesized. The signal-to-noise ratio (SNR) of the reconstructed speech signal, with respect to clean speech, was found to be 2 to 4 dB higher in case of the Riesz transform technique than the sinusoidal demodulation technique.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Compressive sensing (CS) has been proposed for signals with sparsity in a linear transform domain. We explore a signal dependent unknown linear transform, namely the impulse response matrix operating on a sparse excitation, as in the linear model of speech production, for recovering compressive sensed speech. Since the linear transform is signal dependent and unknown, unlike the standard CS formulation, a codebook of transfer functions is proposed in a matching pursuit (MP) framework for CS recovery. It is found that MP is efficient and effective to recover CS encoded speech as well as jointly estimate the linear model. Moderate number of CS measurements and low order sparsity estimate will result in MP converge to the same linear transform as direct VQ of the LP vector derived from the original signal. There is also high positive correlation between signal domain approximation and CS measurement domain approximation for a large variety of speech spectra.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Considering a general linear model of signal degradation, by modeling the probability density function (PDF) of the clean signal using a Gaussian mixture model (GMM) and additive noise by a Gaussian PDF, we derive the minimum mean square error (MMSE) estimator. The derived MMSE estimator is non-linear and the linear MMSE estimator is shown to be a special case. For speech signal corrupted by independent additive noise, by modeling the joint PDF of time-domain speech samples of a speech frame using a GMM, we propose a speech enhancement method based on the derived MMSE estimator. We also show that the same estimator can be used for transform-domain speech enhancement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a simple speech music discriminator that uses features based on HILN(Harmonics, Individual Lines and Noise) model. We have been able to test the strength of the feature set on a standard database of 66 files and get an accuracy of around 97%. We also have tested on sung queries and polyphonic music and have got very good results. The current algorithm is being used to discriminate between sung queries and played (using an instrument like flute) queries for a Query by Humming(QBH) system currently under development in the lab.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Non-uniform sampling of a signal is formulated as an optimization problem which minimizes the reconstruction signal error. Dynamic programming (DP) has been used to solve this problem efficiently for a finite duration signal. Further, the optimum samples are quantized to realize a speech coder. The quantizer and the DP based optimum search for non-uniform samples (DP-NUS) can be combined in a closed-loop manner, which provides distinct advantage over the open-loop formulation. The DP-NUS formulation provides a useful control over the trade-off between bitrate and performance (reconstruction error). It is shown that 5-10 dB SNR improvement is possible using DP-NUS compared to extrema sampling approach. In addition, the close-loop DP-NUS gives a 4-5 dB improvement in reconstruction error.