122 resultados para speech delay

em Indian Institute of Science - Bangalore - Índia


Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a novel approach to represent transients using spectral-domain amplitude-modulated/frequency -modulated (AM-FM) functions. The model is applied to the real and imaginary parts of the Fourier transform (FT) of the transient. The suitability of the model lies in the observation that since transients are well-localized in time, the real and imaginary parts of the Fourier spectrum have a modulation structure. The spectral AM is the envelope and the spectral FM is the group delay function. The group delay is estimated using spectral zero-crossings and the spectral envelope is estimated using a coherent demodulator. We show that the proposed technique is robust to additive noise. We present applications of the proposed technique to castanets and stop-consonants in speech.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We analyze the spectral zero-crossing rate (SZCR) properties of transient signals and show that SZCR contains accurate localization information about the transient. For a train of pulses containing transient events, the SZCR computed on a sliding window basis is useful in locating the impulse locations accurately. We present the properties of SZCR on standard stylized signal models and then show how it may be used to estimate the epochs in speech signals. We also present comparisons with some state-of-the-art techniques that are based on the group-delay function. Experiments on real speech show that the proposed SZCR technique is better than other group-delay-based epoch detectors. In the presence of noise, a comparison with the zero-frequency filtering technique (ZFF) and Dynamic programming projected Phase-Slope Algorithm (DYPSA) showed that performance of the SZCR technique is better than DYPSA and inferior to that of ZFF. For highpass-filtered speech, where ZFF performance suffers drastically, the identification rates of SZCR are better than those of DYPSA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Transient signals such as plosives in speech or Castanets in audio do not have a specific modulation or periodic structure in time domain. However, in the spectral domain they exhibit a prominent modulation structure, which is a direct consequence of their narrow time localization. Based on this observation, a spectral-domain AM-FM model for transients is proposed. The spectral AM-FM model is built starting from real spectral zero-crossings. The AM and FM correspond to the spectral envelope (SE) and group delay (GD), respectively. Taking into account the modulation structure and spectral continuity, a local polynomial regression technique is proposed to estimate the GD function from the real spectral zeros. The SE is estimated based on the phase function computed from the estimated GD. Since the GD estimation is parametric, the degree of smoothness can be controlled directly. Simulation results based on synthetic transient signals generated using a beta density function are presented to analyze the noise-robustness of the SEGD model. Three specific applications are considered: (1) SEGD based modeling of Castanet sounds; (2) appropriateness of the model for transient compression; and (3) determining glottal closure instants in speech using a short-time SEGD model of the linear prediction residue.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report a circuit technique to measure the on-chip delay of an individual logic gate (both inverting and non-inverting) in its unmodified form using digitally reconfigurable ring oscillator (RO). Solving a system of linear equations with different configuration setting of the RO gives delay of an individual gate. Experimental results from a test chip in 65nm process node show the feasibility of measuring the delay of an individual inverter to within 1pS accuracy. Delay measurements of different nominally identical inverters in close physical proximity show variations of up to 26% indicating the large impact of local or within-die variations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report the design and characterization of a circuit technique to measure the on-chip delay of an individual logic gate (both inverting and noninverting) in its unmodified form. The test circuit comprises of digitally reconfigurable ring oscillator (RO). The gate under test is embedded in each stage of the ring oscillator. A system of linear equations is then formed with different configuration settings of the RO, relating the individual gate delay to the measured period of the RO, whose solution gives the delay of the individual gates. Experimental results from a test chip in 65-nm process node show the feasibility of measuring the delay of an individual inverter to within 1 ps accuracy. Delay measurements of different nominally identicall inverters in close physical proximity show variations of up to 28% indicating the large impact of local variations. As a demonstration of this technique, we have studied delay variation with poly-pitch, length of diffusion (LOD) and different orientations of layout in silicon. The proposed technique is quite suitable for early process characterization, monitoring mature process in manufacturing and correlating model-to-hardware.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We are addressing the problem of jointly using multiple noisy speech patterns for automatic speech recognition (ASR), given that they come from the same class. If the user utters a word K times, the ASR system should try to use the information content in all the K patterns of the word simultaneously and improve its speech recognition accuracy compared to that of the single pattern based speech recognition. T address this problem, recently we proposed a Multi Pattern Dynamic Time Warping (MPDTW) algorithm to align the K patterns by finding the least distortion path between them. A Constrained Multi Pattern Viterbi algorithm was used on this aligned path for isolated word recognition (IWR). In this paper, we explore the possibility of using only the MPDTW algorithm for IWR. We also study the properties of the MPDTW algorithm. We show that using only 2 noisy test patterns (10 percent burst noise at -5 dB SNR) reduces the noisy speech recognition error rate by 37.66 percent when compared to the single pattern recognition using the Dynamic Time Warping algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A simple ramp control firing circuit, suitable for use with fully controlled, line-commutated thyristor bridge circuits, is discussed here. This circuit uses very few components and generates the synchronized firing pulses in a simple way. It operates from a single 15 V Supply and has an inherent pulse inhibit facility. This circuit provides the synchronized firing pulses for both thyristors of the same limb in a bridge. To ensure reliability, wide triggering pulses are used, which are modulated to pass through the pulse transformers1 and demodulated before being fed to the thyristor gates. The use of throe such circuits only for a three-phase bridge is discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The experimental results of delay time of a vacuum gap triggered by an exploding wire plasma have been reported. It consists of explosion delay time and propagation delay time. The explosion delay time has been found to be dependent on the parameters of the exploding wire and the exploding wire circuit and is independent of vacuum gap configuration. The propagation delay time depends on the properties of the exploding wire plasma and vacuum gap parameters such as the number of injection slots, gap spacing, gap polarity, etc. In the absence of prebreakdown current in the vacuum gap, the breakdown can be initiated only after the plasma completely bridges the gap spacing. Under this specific condition, it has been shown that the delay time data can be used to calculate the plasma velocity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Compressive sensing (CS) has been proposed for signals with sparsity in a linear transform domain. We explore a signal dependent unknown linear transform, namely the impulse response matrix operating on a sparse excitation, as in the linear model of speech production, for recovering compressive sensed speech. Since the linear transform is signal dependent and unknown, unlike the standard CS formulation, a codebook of transfer functions is proposed in a matching pursuit (MP) framework for CS recovery. It is found that MP is efficient and effective to recover CS encoded speech as well as jointly estimate the linear model. Moderate number of CS measurements and low order sparsity estimate will result in MP converge to the same linear transform as direct VQ of the LP vector derived from the original signal. There is also high positive correlation between signal domain approximation and CS measurement domain approximation for a large variety of speech spectra.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A simple ramp control firing circuit, suitable for use with fully controlled, line-commutated thyristor bridge circuits, is discussed here. This circuit uses very few components and generates the synchronized firing pulses in a simple way. It operates from a single 15 V Supply and has an inherent pulse inhibit facility. This circuit provides the synchronized firing pulses for both thyristors of the same limb in a bridge. To ensure reliability, wide triggering pulses are used, which are modulated to pass through the pulse transformers1 and demodulated before being fed to the thyristor gates. The use of throe such circuits only for a three-phase bridge is discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Single pulse shock tube facility has been developed in the High Temperature Chemical Kinetics Lab, Aerospace Engineering Department, to carry out ignition delay studies and spectroscopic investigations of hydrocarbon fuels. Our main emphasis is on measuring ignition delay through pressure rise and by monitoring CH emission for various jet fuels and finding suitable additives for reducing the delay. Initially the shock tube was tested and calibrated by measuring the ignition delay of C2H6-O2 mixture. The results are in good agreement with earlier published works. Ignition times of exo-tetrahdyrodicyclopentadiene (C10H16), which is a leading candidate fuel for scramjet propulsion has been studied in the reflected shock region in the temperature range 1250 - 1750 K with and without adding Triethylamine (TEA). Addition of TEA results in substantial reduction of ignition delay of C10H16.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Firing delays of a simple triggered vacuum gap are reported in this paper. The effects of insulating materials in the auxiliary gap, auxiliary gap current, main gap current and electrode separation on the delay have been investigated. The presence of insulating material in the auxiliary gap having low auxiliary gap resistance appears to exhibit large delay. Delay decreases considerably with increase of current in the auxiliary and the main gaps, but it increases with increase of electrode separation. The scatter in the delay is less than 25 ps and 500 ps with supramica (Mycalex Corporation of America) and silicon carbide respectively at lower values of auxiliary gap current and it becomes negligible for supramica at auxiliary gap currents greater than 6A. This investigation appears to indicate that the simple device can be used as a fast switch.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for robust voiced/unvoiced segment detection in noisy speech, based on local polynomial regression. The local polynomial model is well-suited for voiced segments in speech. The unvoiced segments are noise-like and do not exhibit any smooth structure. This property of smoothness is used for devising a new metric called the variance ratio metric, which, after thresholding, indicates the voiced/unvoiced boundaries with 75% accuracy for 0dB global signal-to-noise ratio (SNR). A novelty of our algorithm is that it processes the signal continuously, sample-by-sample rather than frame-by-frame. Simulation results on TIMIT speech database (downsampled to 8kHz) for various SNRs are presented to illustrate the performance of the new algorithm. Results indicate that the algorithm is robust even in high noise levels.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We investigate the use of a two stage transform vector quantizer (TSTVQ) for coding of line spectral frequency (LSF) parameters in wideband speech coding. The first stage quantizer of TSTVQ, provides better matching of source distribution and the second stage quantizer provides additional coding gain through using an individual cluster specific decorrelating transform and variance normalization. Further coding gain is shown to be achieved by exploiting the slow time-varying nature of speech spectra and thus using inter-frame cluster continuity (ICC) property in the first stage of TSTVQ method. The proposed method saves 3-4 bits and reduces the computational complexity by 58-66%, compared to the traditional split vector quantizer (SVQ), but at the expense of 1.5-2.5 times of memory.