929 resultados para speaker recognition systems
Resumo:
This article compares the performance of Fuzzy ARTMAP with that of Learned Vector Quantization and Back Propagation on a handwritten character recognition task. Training with Fuzzy ARTMAP to a fixed criterion used many fewer epochs. Voting with Fuzzy ARTMAP yielded the highest recognition rates.
Resumo:
This paper describes the design of a self~organizing, hierarchical neural network model of unsupervised serial learning. The model learns to recognize, store, and recall sequences of unitized patterns, using either short-term memory (STM) or both STM and long-term memory (LTM) mechanisms. Timing information is learned and recall {both from STM and from LTM) is performed with a learned rhythmical structure. The network, bearing similarities with ART (Carpenter & Grossberg 1987a), learns to map temporal sequences to unitized patterns, which makes it suitable for hierarchical operation. It is therefore capable of self-organizing codes for sequences of sequences. The capacity is only limited by the number of nodes provided. Selected simulation results are reported to illustrate system properties.
Resumo:
The processes by which humans and other primates learn to recognize objects have been the subject of many models. Processes such as learning, categorization, attention, memory search, expectation, and novelty detection work together at different stages to realize object recognition. In this article, Gail Carpenter and Stephen Grossberg describe one such model class (Adaptive Resonance Theory, ART) and discuss how its structure and function might relate to known neurological learning and memory processes, such as how inferotemporal cortex can recognize both specialized and abstract information, and how medial temporal amnesia may be caused by lesions in the hippocampal formation. The model also suggests how hippocampal and inferotemporal processing may be linked during recognition learning.
Resumo:
Existing work in Computer Science and Electronic Engineering demonstrates that Digital Signal Processing techniques can effectively identify the presence of stress in the speech signal. These techniques use datasets containing real or actual stress samples i.e. real-life stress such as 911 calls and so on. Studies that use simulated or laboratory-induced stress have been less successful and inconsistent. Pervasive, ubiquitous computing is increasingly moving towards voice-activated and voice-controlled systems and devices. Speech recognition and speaker identification algorithms will have to improve and take emotional speech into account. Modelling the influence of stress on speech and voice is of interest to researchers from many different disciplines including security, telecommunications, psychology, speech science, forensics and Human Computer Interaction (HCI). The aim of this work is to assess the impact of moderate stress on the speech signal. In order to do this, a dataset of laboratory-induced stress is required. While attempting to build this dataset it became apparent that reliably inducing measurable stress in a controlled environment, when speech is a requirement, is a challenging task. This work focuses on the use of a variety of stressors to elicit a stress response during tasks that involve speech content. Biosignal analysis (commercial Brain Computer Interfaces, eye tracking and skin resistance) is used to verify and quantify the stress response, if any. This thesis explains the basis of the author’s hypotheses on the elicitation of affectively-toned speech and presents the results of several studies carried out throughout the PhD research period. These results show that the elicitation of stress, particularly the induction of affectively-toned speech, is not a simple matter and that many modulating factors influence the stress response process. A model is proposed to reflect the author’s hypothesis on the emotional response pathways relating to the elicitation of stress with a required speech content. Finally the author provides guidelines and recommendations for future research on speech under stress. Further research paths are identified and a roadmap for future research in this area is defined.
Resumo:
To investigate the neural systems that contribute to the formation of complex, self-relevant emotional memories, dedicated fans of rival college basketball teams watched a competitive game while undergoing functional magnetic resonance imaging (fMRI). During a subsequent recognition memory task, participants were shown video clips depicting plays of the game, stemming either from previously-viewed game segments (targets) or from non-viewed portions of the same game (foils). After an old-new judgment, participants provided emotional valence and intensity ratings of the clips. A data driven approach was first used to decompose the fMRI signal acquired during free viewing of the game into spatially independent components. Correlations were then calculated between the identified components and post-scanning emotion ratings for successfully encoded targets. Two components were correlated with intensity ratings, including temporal lobe regions implicated in memory and emotional functions, such as the hippocampus and amygdala, as well as a midline fronto-cingulo-parietal network implicated in social cognition and self-relevant processing. These data were supported by a general linear model analysis, which revealed additional valence effects in fronto-striatal-insular regions when plays were divided into positive and negative events according to the fan's perspective. Overall, these findings contribute to our understanding of how emotional factors impact distributed neural systems to successfully encode dynamic, personally-relevant event sequences.
Resumo:
© 2005-2012 IEEE.Within industrial automation systems, three-dimensional (3-D) vision provides very useful feedback information in autonomous operation of various manufacturing equipment (e.g., industrial robots, material handling devices, assembly systems, and machine tools). The hardware performance in contemporary 3-D scanning devices is suitable for online utilization. However, the bottleneck is the lack of real-time algorithms for recognition of geometric primitives (e.g., planes and natural quadrics) from a scanned point cloud. One of the most important and the most frequent geometric primitive in various engineering tasks is plane. In this paper, we propose a new fast one-pass algorithm for recognition (segmentation and fitting) of planar segments from a point cloud. To effectively segment planar regions, we exploit the orthonormality of certain wavelets to polynomial function, as well as their sensitivity to abrupt changes. After segmentation of planar regions, we estimate the parameters of corresponding planes using standard fitting procedures. For point cloud structuring, a z-buffer algorithm with mesh triangles representation in barycentric coordinates is employed. The proposed recognition method is tested and experimentally validated in several real-world case studies.
Resumo:
All biological phenomena depend on molecular recognition, which is either intermolecular like in ligand binding to a macromolecule or intramolecular like in protein folding. As a result, understanding the relationship between the structure of proteins and the energetics of their stability and binding with others (bio)molecules is a very interesting point in biochemistry and biotechnology. It is essential to the engineering of stable proteins and to the structure-based design of pharmaceutical ligands. The parameter generally used to characterize the stability of a system (the folded and unfolded state of the protein for example) is the equilibrium constant (K) or the free energy (deltaG(o)), which is the sum of enthalpic (deltaH(o)) and entropic (deltaS(o)) terms. These parameters are temperature dependent through the heat capacity change (deltaCp). The thermodynamic parameters deltaH(o) and deltaCp can be derived from spectroscopic experiments, using the van't Hoff method, or measured directly using calorimetry. Along with isothermal titration calorimetry (ITC), differential scanning calorimetry (DSC) is a powerful method, less described than ITC, for measuring directly the thermodynamic parameters which characterize biomolecules. In this article, we summarize the principal thermodynamics parameters, describe the DSC approach and review some systems to which it has been applied. DSC is much used for the study of the stability and the folding of biomolecules, but it can also be applied in order to understand biomolecular interactions and can thus be an interesting technique in the process of drug design.
Resumo:
We present a novel approach to goal recognition based on a two-stage paradigm of graph construction and analysis. First, a graph structure called a Goal Graph is constructed to represent the observed actions, the state of the world, and the achieved goals as well as various connections between these nodes at consecutive time steps. Then, the Goal Graph is analysed at each time step to recognise those partially or fully achieved goals that are consistent with the actions observed so far. The Goal Graph analysis also reveals valid plans for the recognised goals or part of these goals. Our approach to goal recognition does not need a plan library. It does not suffer from the problems in the acquisition and hand-coding of large plan libraries, neither does it have the problems in searching the plan space of exponential size. We describe two algorithms for Goal Graph construction and analysis in this paradigm. These algorithms are both provably sound, polynomial-time, and polynomial-space. The number of goals recognised by our algorithms is usually very small after a sequence of observed actions has been processed. Thus the sequence of observed actions is well explained by the recognised goals with little ambiguity. We have evaluated these algorithms in the UNIX domain, in which excellent performance has been achieved in terms of accuracy, efficiency, and scalability.
Resumo:
The authors are concerned with the development of computer systems that are capable of using information from faces and voices to recognise people's emotions in real-life situations. The paper addresses the nature of the challenges that lie ahead, and provides an assessment of the progress that has been made in the areas of signal processing and analysis techniques (with regard to speech and face), and the psychological and linguistic analyses of emotion. Ongoing developmental work by the authors in each of these areas is described.
Resumo:
Use of the Dempster-Shafer (D-S) theory of evidence to deal with uncertainty in knowledge-based systems has been widely addressed. Several AI implementations have been undertaken based on the D-S theory of evidence or the extended theory. But the representation of uncertain relationships between evidence and hypothesis groups (heuristic knowledge) is still a major problem. This paper presents an approach to representing such knowledge, in which Yen’s probabilistic multi-set mappings have been extended to evidential mappings, and Shafer’s partition technique is used to get the mass function in a complex evidence space. Then, a new graphic method for describing the knowledge is introduced which is an extension of the graphic model by Lowrance et al. Finally, an extended framework for evidential reasoning systems is specified.
Resumo:
In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modelling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.