981 resultados para free speech
Resumo:
Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication. In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on 1,138 work vocabulary RM1 task and 6,224 word vocabulary TIMIT task using Sphinx 3.7 system show that, for a typical case the matrix multiplication based approach leads to overall speedup of 46 % on RM1 task and 115 % for TIMIT task. Our low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 % for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 % for RM1 and for no increase in WER for TIMIT. We also express pairwise Euclidean distance computation phase in Dynamic Time Warping (DTW) in terms of matrix multiplication leading to saving of approximately of computational operations. In our experiments using efficient implementation of matrix multiplication, this leads to a speedup of 5.6 in computing the pairwise Euclidean distances and overall speedup up to 3.25 for DTW.
Resumo:
We report the room temperature cell performance of alkaline direct methanol fuel cells (ADMFCs) with nitrogen-doped carbon nanotubes (NCNTs) as cathode materials. NCNTs show excellent oxygen reduction reaction activity and methanol tolerance in alkaline medium. The open-circuit-voltage (OCV) as well as the power density of ADMFCs first increases and then saturates with NCNT loading. Similarly, the OCV initially increases and reaches saturation with the increase in the concentration of methanol feed stock. Overall, NCNTs exhibit excellent catalytic activity and stability with respect to Pt based cathodes.
Resumo:
Memory models for shared-memory concurrent programming languages typically guarantee sequential consistency (SC) semantics for datarace-free (DRF) programs, while providing very weak or no guarantees for non-DRF programs. In effect programmers are expected to write only DRF programs, which are then executed with SC semantics. With this in mind, we propose a novel scalable solution for dataflow analysis of concurrent programs, which is proved to be sound for DRF programs with SC semantics. We use the synchronization structure of the program to propagate dataflow information among threads without requiring to consider all interleavings explicitly. Given a dataflow analysis that is sound for sequential programs and meets certain criteria, our technique automatically converts it to an analysis for concurrent programs.
Resumo:
We consider the speech production mechanism and the asso- ciated linear source-filter model. For voiced speech sounds in particular, the source/glottal excitation is modeled as a stream of impulses and the filter as a cascade of second-order resonators. We show that the process of sampling speech signals can be modeled as filtering a stream of Dirac impulses (a model for the excitation) with a kernel function (the vocal tract response),and then sampling uniformly. We show that the problem of esti- mating the excitation is equivalent to the problem of recovering a stream of Dirac impulses from samples of a filtered version. We present associated algorithms based on the annihilating filter and also make a comparison with the classical linear prediction technique, which is well known in speech analysis. Results on synthesized as well as natural speech data are presented.
Resumo:
We address the problem of speech enhancement in real-world noisy scenarios. We propose to solve the problem in two stages, the first comprising a generalized spectral subtraction technique, followed by a sequence of perceptually-motivated post-processing algorithms. The role of the post-processing algorithms is to compensate for the effects of noise as well as to suppress any artifacts created by the first-stage processing. The key post-processing mechanisms are aimed at suppressing musical noise and to enhance the formant structure of voiced speech as well as to denoise the linear-prediction residual. The parameter values in the techniques are fixed optimally by experimentally evaluating the enhancement performance as a function of the parameters. We used the Carnegie-Mellon university Arctic database for our experiments. We considered three real-world noise types: fan noise, car noise, and motorbike noise. The enhancement performance was evaluated by conducting listening experiments on 12 subjects. The listeners reported a clear improvement (MOS improvement of 0.5 on an average) over the noisy signal in the perceived quality (increase in the mean-opinion score (MOS)) for positive signal-to-noise-ratios (SNRs). For negative SNRs, however, the improvement was found to be marginal.
Resumo:
Research in the field of recognizing unlimited vocabulary, online handwritten Indic words is still in its infancy. Most of the focus so far has been in the area of isolated character recognition. In the context of lexicon-free recognition of words, one of the primary issues to be addressed is that of segmentation. As a preliminary attempt, this paper proposes a novel script-independent, lexicon-free method for segmenting online handwritten words to their constituent symbols. Feedback strategies, inspired from neuroscience studies, are proposed for improving the segmentation. The segmentation strategy has been tested on an exhaustive set of 10000 Tamil words collected from a large number of writers. The results show that better segmentation improves the overall recognition performance of the handwriting system.
Resumo:
In this paper we propose a postprocessing technique for a spectrogram diffusion based harmonic/percussion decom- position algorithm. The proposed technique removes har- monic instrument leakages in the percussion enhanced out- puts of the baseline algorithm. The technique uses median filtering and an adaptive detection of percussive segments in subbands followed by piecewise signal reconstruction using envelope properties to ensure that percussion is enhanced while harmonic leakages are suppressed. A new binary mask is created for the percussion signal which upon applying on the original signal improves harmonic versus percussion separation. We compare our algorithm with two recent techniques and show that on a database of polyphonic Indian music, the postprocessing algorithm improves the harmonic versus percussion decomposition significantly.
Resumo:
The tensile behavior of a high activity stand-alone Pt-aluminide (PtAl) bond coat was evaluated by the micro-tensile test method at various temperatures (room temperature to 1100 degrees C) and strain rates (10(-5) s(-1)-10(-1) s(-1).) At all strain rates, the stress strain behavior of the stand-alone coating was significantly affected by the variation in temperature. The stress strain response was linear, indicating brittle behavior, at temperatures below the brittle ductile transition temperature (BDTT). The coating exhibited appreciable ductility (up to 2%) above the BDTT. The strength (both yield stress and ultimate tensile strength) of the coating decreased and its ductility increased with increasing temperature above the BDTT. The tensile behavior of the coating was sensitive to strain rate in the ductile regime, with its strength increasing with increasing strain rate at any given temperature. The BDTT of the coating was found to increase with increasing with increasing strain rate. The coating exhibited two distinct mechanisms of deformation above the BDTT. The transition temperature for the change of deformation mechanism also increased with increasing strain rate. (C) 2012 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.
Resumo:
Analyses of the invariants of the velocity gradient ten- sor were performed on flow fields obtained by DNS of compressible plane mixing layers at convective Mach num- bers Mc=0:15 and 1.1. Joint pdfs of the 2nd and 3rd invariants were examined at turbulent/nonturbulent (T/NT) boundaries—defined as surfaces where the local vorticity first exceeds a threshold fraction of the maximum of the mean vorticity. By increasing the threshold from very small lev-els, the boundary points were moved closer into the turbulent region, and the effects on the pdfs of the invariants were ob-served. Generally, T/NT boundaries are in sheet-like regions at both Mach numbers. At the higher Mach number a distinct lobe appears in the joint pdf isolines which has not been ob-served/reported before. A connection to the delayed entrain-ment and reduced growth rate of the higher Mach number flow is proposed.
Resumo:
alpha-Azidoacetophenones were converted into 2-aryl-1,3-oxazole-4-carbaldehydes through rearrangement of the carbon framework upon exposure to DMF/POCl3. The unprecedented rearrangement occurs via alkenyl azides and 2H-azirines. A mechanism for this unusual reaction was proposed and evidenced.
Resumo:
Bulk texture measurement of multi-axial forged body center cubic interstitial free steel performed in this study using x-ray and neutron diffraction indicated the presence of a strong {101}aOE (c) 111 > single texture component. Viscoplastic self-consistent simulations could successfully predict the formation of this texture component by incorporating the complicated strain path followed during this process and assuming the activity of {101}aOE (c) 111 > slip system. In addition, a first-order estimate of mechanical properties in terms of highly anisotropic yield locus and Lankford parameter was also obtained from the simulations.
Resumo:
We analyze the spectral zero-crossing rate (SZCR) properties of transient signals and show that SZCR contains accurate localization information about the transient. For a train of pulses containing transient events, the SZCR computed on a sliding window basis is useful in locating the impulse locations accurately. We present the properties of SZCR on standard stylized signal models and then show how it may be used to estimate the epochs in speech signals. We also present comparisons with some state-of-the-art techniques that are based on the group-delay function. Experiments on real speech show that the proposed SZCR technique is better than other group-delay-based epoch detectors. In the presence of noise, a comparison with the zero-frequency filtering technique (ZFF) and Dynamic programming projected Phase-Slope Algorithm (DYPSA) showed that performance of the SZCR technique is better than DYPSA and inferior to that of ZFF. For highpass-filtered speech, where ZFF performance suffers drastically, the identification rates of SZCR are better than those of DYPSA.
Resumo:
The goal of speech enhancement algorithms is to provide an estimate of clean speech starting from noisy observations. The often-employed cost function is the mean square error (MSE). However, the MSE can never be computed in practice. Therefore, it becomes necessary to find practical alternatives to the MSE. In image denoising problems, the cost function (also referred to as risk) is often replaced by an unbiased estimator. Motivated by this approach, we reformulate the problem of speech enhancement from the perspective of risk minimization. Some recent contributions in risk estimation have employed Stein's unbiased risk estimator (SURE) together with a parametric denoising function, which is a linear expansion of threshold/bases (LET). We show that the first-order case of SURE-LET results in a Wiener-filter type solution if the denoising function is made frequency-dependent. We also provide enhancement results obtained with both techniques and characterize the improvement by means of local as well as global SNR calculations.
Resumo:
We address the problem of speech enhancement using a risk- estimation approach. In particular, we propose the use the Stein’s unbiased risk estimator (SURE) for solving the problem. The need for a suitable finite-sample risk estimator arises because the actual risks invariably depend on the unknown ground truth. We consider the popular mean-squared error (MSE) criterion first, and then compare it against the perceptually-motivated Itakura-Saito (IS) distortion, by deriving unbiased estimators of the corresponding risks. We use a generalized SURE (GSURE) development, recently proposed by Eldar for MSE. We consider dependent observation models from the exponential family with an additive noise model,and derive an unbiased estimator for the risk corresponding to the IS distortion, which is non-quadratic. This serves to address the speech enhancement problem in a more general setting. Experimental results illustrate that the IS metric is efficient in suppressing musical noise, which affects the MSE-enhanced speech. However, in terms of global signal-to-noise ratio (SNR), the minimum MSE solution gives better results.