925 resultados para Compressed speech
Resumo:
In this paper, we presents HyperSausage Neuron based on the High-Dimension Space(HDS), and proposes a new algorithm for speaker independent continuous digit speech recognition. At last, compared to HMM-based method, the recognition rate of HyperSausage Neuron method is higher than that of in HMM-based method.
Resumo:
In recognition-based user interface, users’ satisfaction is determined not only by recognition accuracy but also by effort to correct recognition errors. In this paper, we introduce a crossmodal error correction technique, which allows users to correct errors of Chinese handwriting recognition by speech. The focus of the paper is a multimodal fusion algorithm supporting the crossmodal error correction. By fusing handwriting and speech recognition, the algorithm can correct errors in both character extraction and recognition of handwriting. The experimental result indicates that the algorithm is effective and efficient. Moreover, the evaluation also shows the correction technique can help users to correct errors in handwriting recognition more efficiently than the other two error correction techniques.
Resumo:
Hydrogenation of alpha,beta-unsaturated aldehydes (citral, 3-methyl-2-butenal, cinnamaldehyde) has been studied with tetrakis(triphenylphosphine) ruthenium dihydride (H2Ru(TPP)(4)) catalyst in a poly(ethylene glycol) (PEG)/ compressed carbon dioxide biphasic system. The hydrogenation reaction was slow under PEG/ H-2 biphasic conditions at H-2 4 MPa in the absence of CO2. When the reaction mixture was pressurized by a non-reactant of CO2, however, the reaction was significantly accelerated.
Resumo:
The potential of CO2-expanded liquid media for chemical reactions has been examined in this work, using cyclohexane as a solvent and Pd/C as a heterogeneous catalyst for hydrogenation of styrene, citral, and nitrobenzene with H-2. The rate of hydrogenation reactions is increased, and the product selectivity is altered in the CO2-expanded cyclohexane phase. In the hydrogenation of citral, the selectivity to citronellal decreases with CO2 pressure, which changes from similar to 80% in the neat cyclohexane to similar to 65% at 16 MPa.
Resumo:
Blend modified polyimide (PI) hollow fiber membranes were used in vapor permeation for gas phase dehydration of ethanol. Dry air sweeping operation was used and the dry air was supplied by a dehumidification membrane module of compressed air. An integrated membrane process was composed. The effects of some factors, such as the modification of membrane materials, the humidity and current velocity of sweeping air, the operation temperature, on the efficiency of dehydration were discussed.
Resumo:
Self-organization of BaF2 single crystal film under a compressed monolayer of behenic acid (BA) has been investigated by using X-ray diffraction (XRD) and scanning electron microscopy (SEM). The experimental results indicated the (100)-oriented single crystal film of BaF2 was formed under the BA monolayer. The relation between the BaF2 single crystal and the monolayer was discussed.
Resumo:
Selective crystallization of BaF2 crystals under a compressed Langmuir monolayer of behenic acid [CH3(CH2)(20)COOH] has been studied by using X-ray diffraction, X-ray photoelectron spectroscopy, scanning electron microscopy, and energy-dispersive X-ray analysis. It was found that, in the absence of a monolayer, three kinds of crystals (Ba2ClF3, BaClF, and BaF2) can be obtained by mixing BaCl2 with a NH4F solution. However, in the presence of the monolayer of behenic acid, only BaF2 crystals appear at the monolayer-subphase interface and crystals have a special crystal face (100). During this process of crystallization, the monolayer plays a very important role and acts as a template that can preferentially select a special crystal and a special crystal face. The above results can be explained in terms of a specific molecular interaction between ions and the headgroups of the monolayer and specific electrostatic, geometric, and stereochemical interactions at the organic-inorganic interface.
Resumo:
This work addresses two related questions. The first question is what joint time-frequency energy representations are most appropriate for auditory signals, in particular, for speech signals in sonorant regions. The quadratic transforms of the signal are examined, a large class that includes, for example, the spectrograms and the Wigner distribution. Quasi-stationarity is not assumed, since this would neglect dynamic regions. A set of desired properties is proposed for the representation: (1) shift-invariance, (2) positivity, (3) superposition, (4) locality, and (5) smoothness. Several relations among these properties are proved: shift-invariance and positivity imply the transform is a superposition of spectrograms; positivity and superposition are equivalent conditions when the transform is real; positivity limits the simultaneous time and frequency resolution (locality) possible for the transform, defining an uncertainty relation for joint time-frequency energy representations; and locality and smoothness tradeoff by the 2-D generalization of the classical uncertainty relation. The transform that best meets these criteria is derived, which consists of two-dimensionally smoothed Wigner distributions with (possibly oriented) 2-D guassian kernels. These transforms are then related to time-frequency filtering, a method for estimating the time-varying 'transfer function' of the vocal tract, which is somewhat analogous to ceptstral filtering generalized to the time-varying case. Natural speech examples are provided. The second question addressed is how to obtain a rich, symbolic description of the phonetically relevant features in these time-frequency energy surfaces, the so-called schematic spectrogram. Time-frequency ridges, the 2-D analog of spectral peaks, are one feature that is proposed. If non-oriented kernels are used for the energy representation, then the ridge tops can be identified, with zero-crossings in the inner product of the gradient vector and the direction of greatest downward curvature. If oriented kernels are used, the method can be generalized to give better orientation selectivity (e.g., at intersecting ridges) at the cost of poorer time-frequency locality. Many speech examples are given showing the performance for some traditionally difficult cases: semi-vowels and glides, nasalized vowels, consonant-vowel transitions, female speech, and imperfect transmission channels.
Resumo:
A neuroanatomical parcellation system is described which encompasses the entire cerebral cortex and the cerebellum. The cortical system modified version of the scheme described by Caviness et al. (1996) and is designed particularly for studies of speech processing. The cerebellum is parcellated into 6 cortical regions of interest (ROIs) and an ROI representing the deep cerebellar nuclei in each hemisphere. The boundaries of each ROI are based on individual anatomical markers that are clearly visible from standard structural MRI acquistions. The system permits averaginh of functional imaging data sets from multiple sujects while accounting for individual anatomical variability. Used in conjuction with region-of-interest analysis techniques such as that described by Nieto-Castanon et al. (2003), the parcellation system provides a more powerful means of analyzing functional data.
Resumo:
Speech can be understood at widely varying production rates. A working memory is described for short-term storage of temporal lists of input items. The working memory is a cooperative-competitive neural network that automatically adjusts its integration rate, or gain, to generate a short-term memory code for a list that is independent of item presentation rate. Such an invariant working memory model is used to simulate data of Repp (1980) concerning the changes of phonetic category boundaries as a function of their presentation rate. Thus the variability of categorical boundaries can be traced to the temporal in variance of the working memory code.
Resumo:
This article describes a neural network model that addresses the acquisition of speaking skills by infants and subsequent motor equivalent production of speech sounds. The model learns two mappings during a babbling phase. A phonetic-to-orosensory mapping specifies a vocal tract target for each speech sound; these targets take the form of convex regions in orosensory coordinates defining the shape of the vocal tract. The babbling process wherein these convex region targets are formed explains how an infant can learn phoneme-specific and language-specific limits on acceptable variability of articulator movements. The model also learns an orosensory-to-articulatory mapping wherein cells coding desired movement directions in orosensory space learn articulator movements that achieve these orosensory movement directions. The resulting mapping provides a natural explanation for the formation of coordinative structures. This mapping also makes efficient use of redundancy in the articulator system, thereby providing the model with motor equivalent capabilities. Simulations verify the model's ability to compensate for constraints or perturbations applied to the articulators automatically and without new learning and to explain contextual variability seen in human speech production.