Biblioteca Digital

Boltzmann machines offer a new and exciting approach to automatic speech recognition, and provide a rigorous mathematical formalism for parallel computing arrays. In this paper we briefly summarize Boltzmann machine theory, and present results showing their ability to recognize both static and time-varying speech patterns. A machine with 2000 units was able to distinguish between the 11 steady-state vowels in English with an accuracy of 85%. The stability of the learning algorithm and methods of preprocessing and coding speech data before feeding it to the machine are also discussed. A new type of unit called a carry input unit, which involves a type of state-feedback, was developed for the processing of time-varying patterns and this was tested on a few short sentences. Use is made of the implications of recent work into associative memory, and the modelling of neural arrays to suggest a good configuration of Boltzmann machines for this sort of pattern recognition.

Veja mais

Phone-level pronunciation scoring and assessment for interactive language learning

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper investigates a method of automatic pronunciation scoring for use in computer-assisted language learning (CALL) systems. The method utilizes a likelihood-based `Goodness of Pronunciation' (GOP) measure which is extended to include individual thresholds for each phone based on both averaged native confidence scores and on rejection statistics provided by human judges. Further improvements are obtained by incorporating models of the subject's native language and by augmenting the recognition networks to include expected pronunciation errors. The various GOP measures are assessed using a specially recorded database of non-native speakers which has been annotated to mark phone-level pronunciation errors. Since pronunciation assessment is highly subjective, a set of four performance measures has been designed, each of them measuring different aspects of how well computer-derived phone-level scores agree with human scores. These performance measures are used to cross-validate the reference annotations and to assess the basic GOP algorithm and its refinements. The experimental results suggest that a likelihood-based pronunciation scoring metric can achieve usable performance, especially after applying the various enhancements.

Veja mais

The modified Kanerva model for automatic speech recognition

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A parallel processing network derived from Kanerva's associative memory theory Kanerva 1984 is shown to be able to train rapidly on connected speech data and recognize further speech data with a label error rate of 0·68%. This modified Kanerva model can be trained substantially faster than other networks with comparable pattern discrimination properties. Kanerva presented his theory of a self-propagating search in 1984, and showed theoretically that large-scale versions of his model would have powerful pattern matching properties. This paper describes how the design for the modified Kanerva model is derived from Kanerva's original theory. Several designs are tested to discover which form may be implemented fastest while still maintaining versatile recognition performance. A method is developed to deal with the time varying nature of the speech signal by recognizing static patterns together with a fixed quantity of contextual information. In order to recognize speech features in different contexts it is necessary for a network to be able to model disjoint pattern classes. This type of modelling cannot be performed by a single layer of links. Network research was once held back by the inability of single-layer networks to solve this sort of problem, and the lack of a training algorithm for multi-layer networks. Rumelhart, Hinton & Williams 1985 provided one solution by demonstrating the "back propagation" training algorithm for multi-layer networks. A second alternative is used in the modified Kanerva model. A non-linear fixed transformation maps the pattern space into a space of higher dimensionality in which the speech features are linearly separable. A single-layer network may then be used to perform the recognition. The advantage of this solution over the other using multi-layer networks lies in the greater power and speed of the single-layer network training algorithm. © 1989.

Veja mais

Extended VTS for noise-robust speech recognition

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Model compensation is a standard way of improving the robustness of speech recognition systems to noise. A number of popular schemes are based on vector Taylor series (VTS) compensation, which uses a linear approximation to represent the influence of noise on the clean speech. To compensate the dynamic parameters, the continuous time approximation is often used. This approximation uses a point estimate of the gradient, which fails to take into account that dynamic coefficients are a function of a number of consecutive static coefficients. In this paper, the accuracy of dynamic parameter compensation is improved by representing the dynamic features as a linear transformation of a window of static features. A modified version of VTS compensation is applied to the distribution of the window of static features and, importantly, their correlations. These compensated distributions are then transformed to distributions over standard static and dynamic features. With this improved approximation, it is also possible to obtain full-covariance corrupted speech distributions. This addresses the correlation changes that occur in noise. The proposed scheme outperformed the standard VTS scheme by 10% to 20% relative on a range of tasks. © 2006 IEEE.

Veja mais

GABA and its agonists improved visual cortical function in senescent monkeys

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Human cerebral cortical function degrades during old age. Much of this change may result from a degradation of intracortical inhibition during senescence. We used multibarreled microelectrodes to study the effects of electrophoretic application of gamma-aminobutyric acid (GABA), the GABA type a (GABAa) receptor agonist muscimol, and the GABAa receptor antagonist bicuculline, respectively, on the properties of individual V1 cells in old monkeys. Bicuculline exerted a much weaker effect on neuronal responses in old than in young animals, confirming a degradation of GABA-mediated inhibition. On the other hand, the administration of GABA and muscimol resulted in improved visual function. Many treated cells in area V1 of old animals displayed responses typical of young cells. The present results have important implications for the treatment of the sensory, motor, and cognitive declines that accompany old age.

Veja mais

Preparation and characterization of a novel pyrrole-benzophenone copolymerized silica nanocomposite as a reagent in a visual immunologic-agglutination test.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Biological sensing is explored through novel stable colloidal dispersions of pyrrole-benzophenone and pyrrole copolymerized silica (PPy-SiO(2)-PPyBPh) nanocomposites, which allow covalent linking of biological molecules through light mediation. The mechanism of nanocomposite attachment to a model protein is studied by gold labeled cholera toxin B (CTB) to enhance the contrast in electron microscopy imaging. The biological test itself is carried out without gold labeling, i.e., using CTB only. The protein is shown to be covalently bound through the benzophenone groups. When the reactive PPy-SiO(2)-PPyBPh-CTB nanocomposite is exposed to specific recognition anti-CTB immunoglobulins, a qualitative visual agglutination assay occurs spontaneously, producing as a positive test, PPy-SiO(2)-PPyBPh-CTB-anti-CTB, in less than 1 h, while the control solution of the PPy-SiO(2)-PPyBPh-CTB alone remained well-dispersed during the same period. These dispersions were characterized by cryogenic transmission microscopy (cryo-TEM), scanning electron microscopy (SEM), FTIR and X-ray photoelectron spectroscopy (XPS).

Veja mais

903 resultados para hand-drawn visual language recognition

Filtro por publicador