69 resultados para Speech synthesis Data processing
Resumo:
The hot deformation behavior of α brass with varying zinc contents in the range 3%–30% was characterized using hot compression testing in the temperature range 600–900 °C and strain rate range 0.001–100 s−1. On the basis of the flow stress data, processing maps showing the variation of the efficiency of power dissipation (given by Image where m is the strain rate sensitivity) with temperature and strain rate were obtained. α brass exhibits a domain of dynamic recrystallization (DRX) at temperatures greater than 0.85Tm and at strain rates lower than 1 s−1. The maximum efficiency of power dissipation increases with increasing zinc content and is in the range 33%–53%. The DRX domain shifts to lower strain rates for higher zinc contents and the strain rate for peak efficiency is in the range 0.0001–0.05 s−1. The results indicate that the DRX in α brass is controlled by the rate of interface formation (nucleation) which depends on the diffusion-controlled process of thermal recovery by climb.
Resumo:
The effect of zirconium on the hot working characteristics of alpha and alpha-beta brass was studied in the temperature range of 500 to 850-degrees-C and the strain rate range of 0.001 to 100 s-1. On the basis of the flow stress data, processing maps showing the variation of the efficiency of power dissipation (given by [2m/(m+1)] where m is the strain rate sensitivity) with temperature and strain rate were obtained. The addition of zirconium to alpha brass decreased the maximum efficiency of power dissipation from 53 to 39%, increased the strain rate for dynamic recrystallization (DRX) from 0.001 to 0.1 s-1 and improved the hot workability. Alpha-beta brasses with and without zirconium exhibit a domain in the temperature range from 550 to 750-degrees-C and at strain rates lower than 1 s-1 with a maximum efficiency of power dissipation of nearly 50 % occurring in the temperature range of 700 to 750-degrees-C and a strain rate of 0.001 s-1. In the domain, the alpha phase undergoes DRX and controls the hot deformation of the alloy whereas the beta phase deforms superplastically. The addition of zirconium to alpha-beta brass has not affected the processing maps as it gets partitioned to the beta phase and does not alter the constitutive behavior of the alpha phase
Resumo:
The constitutive behaviour of agr — nickel silver in the temperature range 700–950 °C and strain rate range 0.001–100 s–1 was characterized with the help of a processing map generated on the basis of the principles of the ldquodynamic materials modelrdquo of Prasadet al Using the flow stress data, processing maps showing the variation of the efficiency of power dissipation (given by 2m/(m+1) wherem is the strain-rate sensitivity) with temperature and strain rate were obtained, agr-nickel silver exhibits a single domain at temperatures greater than 750 °C and at strain rates lower than 1s–1, with a maximum efficiency of 38% occurring at about 950 °C and at a strain rate of 0.1 s–1. In the domain the material undergoes dynamic recrystallization (DRX). On the basis of a model, it is shown that the DRX is controlled by the rate of interface formation (nucleation) which depends on the diffusion-controlled process of thermal recovery by climb. At high strain rates (10 and 100s–1) the material undergoes microstructural instabilities, the manifestations of which are in the form of adiabatic shear bands and strain markings.
Resumo:
The constitutive behaviour of agr-beta nickel silver in the temperature range 600�850 °C and strainrate range 0.001�100s�1 was characterized with the help of a processing map generated on the principles of the dynamic materials model. On the basis of the flow-stress data, processing maps showing the variation of the efficiency of power dissipation (given by [2m/(m+1)], wherem is the strain-rate sensitivity) with temperature and strain rate were obtained, agr-beta nickel silver exhibits a single domain at temperatures greater than 700 °C and at strain rates lower than 1 s�1 with a maximum efficiency of power dissipation of about 42% occurring at about 850 °C and at 0.1 s�1. In the domain, the agr phase undergoes dynamic recrystallization and controls the deformation of the alloy, while the beta phase deforms superplastically. Optimum conditions for the processing of agr-beta nickel silver are 850 °C and 0.1 s�1. The material undergoes unstable flow at strain rates of 10 and 100 s�1 and in the temperature range 600�750 °C, manifestated in the form of adiabatic shear bands.
Resumo:
Fine particles of willemite, alpha -Zn2SiO4, were prepared by both solution combustion and sol-gel methods. Both processes yield single-phase, large-surface area (26- and 78-m(2)/g), sinteractive willemite powders. Thermal evolution of crystalline phases was studied using X-ray powder diffraction patterns. The combustion method favors low-temperature formation of willemite compared to the sol-gel method. The powders, when uniaxially pressed and sintered at 1300 degreesC, achieved 78-80% theoretical density. The microstructures of the sintered body show the presence of equiaxed 0.5- to 4-mum grains. Blue pigments of willemite doped with Co2+ and Ni2+ were also prepared by the combustion process.
Resumo:
We introduce a novel temporal feature of a signal, namely extrema-based signal track length (ESTL) for the problem of speech segmentation. We show that ESTL measure is sensitive to both amplitude and frequency of the signal. The short-time ESTL (ST_ESTL) shows a promising way to capture the significant segments of speech signal, where the segments correspond to acoustic units of speech having distinct temporal waveforms. We compare ESTL based segmentation with ML and STM methods and find that it is as good as spectral feature based segmentation, but with lesser computational complexity.
Resumo:
This paper presents a method of designing a programmable signal processor based on a bit parallel matrix vector matrix multiplier (linear transformer). The salient feature of this design is that the efficiency of the direct vector matrix multiplier is improved and VLSI design is made much simpler by trading off the more expensive arithematic operation (multiplication) for 'cheaper' manipulation (addition/subtraction) of the data.
Resumo:
Current scientific research is characterized by increasing specialization, accumulating knowledge at a high speed due to parallel advances in a multitude of sub-disciplines. Recent estimates suggest that human knowledge doubles every two to three years – and with the advances in information and communication technologies, this wide body of scientific knowledge is available to anyone, anywhere, anytime. This may also be referred to as ambient intelligence – an environment characterized by plentiful and available knowledge. The bottleneck in utilizing this knowledge for specific applications is not accessing but assimilating the information and transforming it to suit the needs for a specific application. The increasingly specialized areas of scientific research often have the common goal of converting data into insight allowing the identification of solutions to scientific problems. Due to this common goal, there are strong parallels between different areas of applications that can be exploited and used to cross-fertilize different disciplines. For example, the same fundamental statistical methods are used extensively in speech and language processing, in materials science applications, in visual processing and in biomedicine. Each sub-discipline has found its own specialized methodologies making these statistical methods successful to the given application. The unification of specialized areas is possible because many different problems can share strong analogies, making the theories developed for one problem applicable to other areas of research. It is the goal of this paper to demonstrate the utility of merging two disparate areas of applications to advance scientific research. The merging process requires cross-disciplinary collaboration to allow maximal exploitation of advances in one sub-discipline for that of another. We will demonstrate this general concept with the specific example of merging language technologies and computational biology.
Resumo:
Parallel sub-word recognition (PSWR) is a new model that has been proposed for language identification (LID) which does not need elaborate phonetic labeling of the speech data in a foreign language. The new approach performs a front-end tokenization in terms of sub-word units which are designed by automatic segmentation, segment clustering and segment HMM modeling. We develop PSWR based LID in a framework similar to the parallel phone recognition (PPR) approach in the literature. This includes a front-end tokenizer and a back-end language model, for each language to be identified. Considering various combinations of the statistical evaluation scores, it is found that PSWR can perform as well as PPR, even with broad acoustic sub-word tokenization, thus making it an efficient alternative to the PPR system.
Resumo:
We address the problem of robust formant tracking in continuous speech in the presence of additive noise. We propose a new approach based on mixture modeling of the formant contours. Our approach consists of two main steps: (i) Computation of a pyknogram based on multiband amplitude-modulation/frequency-modulation (AM/FM) decomposition of the input speech; and (ii) Statistical modeling of the pyknogram using mixture models. We experiment with both Gaussian mixture model (GMM) and Student's-t mixture model (tMM) and show that the latter is robust with respect to handling outliers in the pyknogram data, parameter selection, accuracy, and smoothness of the estimated formant contours. Experimental results on simulated data as well as noisy speech data show that the proposed tMM-based approach is also robust to additive noise. We present performance comparisons with a recently developed adaptive filterbank technique proposed in the literature and the classical Burg's spectral estimator technique, which show that the proposed technique is more robust to noise.
Resumo:
We address the problem of phase retrieval, which is frequently encountered in optical imaging. The measured quantity is the magnitude of the Fourier spectrum of a function (in optics, the function is also referred to as an object). The goal is to recover the object based on the magnitude measurements. In doing so, the standard assumptions are that the object is compactly supported and positive. In this paper, we consider objects that admit a sparse representation in some orthonormal basis. We develop a variant of the Fienup algorithm to incorporate the condition of sparsity and to successively estimate and refine the phase starting from the magnitude measurements. We show that the proposed iterative algorithm possesses Cauchy convergence properties. As far as the modality is concerned, we work with measurements obtained using a frequency-domain optical-coherence tomography experimental setup. The experimental results on real measured data show that the proposed technique exhibits good reconstruction performance even with fewer coefficients taken into account for reconstruction. It also suppresses the autocorrelation artifacts to a significant extent since it estimates the phase accurately.
Resumo:
Automated image segmentation techniques are useful tools in biological image analysis and are an essential step in tracking applications. Typically, snakes or active contours are used for segmentation and they evolve under the influence of certain internal and external forces. Recently, a new class of shape-specific active contours have been introduced, which are known as Snakuscules and Ovuscules. These contours are based on a pair of concentric circles and ellipses as the shape templates, and the optimization is carried out by maximizing a contrast function between the outer and inner templates. In this paper, we present a unified approach to the formulation and optimization of Snakuscules and Ovuscules by considering a specific form of affine transformations acting on a pair of concentric circles. We show how the parameters of the affine transformation may be optimized for, to generate either Snakuscules or Ovuscules. Our approach allows for a unified formulation and relies only on generic regularization terms and not shape-specific regularization functions. We show how the calculations of the partial derivatives may be made efficient thanks to the Green's theorem. Results on synthesized as well as real data are presented.
Resumo:
We address the problem of multi-instrument recognition in polyphonic music signals. Individual instruments are modeled within a stochastic framework using Student's-t Mixture Models (tMMs). We impose a mixture of these instrument models on the polyphonic signal model. No a priori knowledge is assumed about the number of instruments in the polyphony. The mixture weights are estimated in a latent variable framework from the polyphonic data using an Expectation Maximization (EM) algorithm, derived for the proposed approach. The weights are shown to indicate instrument activity. The output of the algorithm is an Instrument Activity Graph (IAG), using which, it is possible to find out the instruments that are active at a given time. An average F-ratio of 0 : 7 5 is obtained for polyphonies containing 2-5 instruments, on a experimental test set of 8 instruments: clarinet, flute, guitar, harp, mandolin, piano, trombone and violin.
Resumo:
Sparse representation based classification (SRC) is one of the most successful methods that has been developed in recent times for face recognition. Optimal projection for Sparse representation based classification (OPSRC)1] provides a dimensionality reduction map that is supposed to give optimum performance for SRC framework. However, the computational complexity involved in this method is too high. Here, we propose a new projection technique using the data scatter matrix which is computationally superior to the optimal projection method with comparable classification accuracy with respect OPSRC. The performance of the proposed approach is benchmarked with various publicly available face database.
Resumo:
This paper describes a spatio-temporal registration approach for speech articulation data obtained from electromagnetic articulography (EMA) and real-time Magnetic Resonance Imaging (rtMRI). This is motivated by the potential for combining the complementary advantages of both types of data. The registration method is validated on EMA and rtMRI datasets obtained at different times, but using the same stimuli. The aligned corpus offers the advantages of high temporal resolution (from EMA) and a complete mid-sagittal view (from rtMRI). The co-registration also yields optimum placement of EMA sensors as articulatory landmarks on the magnetic resonance images, thus providing richer spatio-temporal information about articulatory dynamics. (C) 2014 Acoustical Society of America