38 resultados para recognition system
Resumo:
In this paper, we present a new approach to visual speech recognition which improves contextual modelling by combining Inter-Frame Dependent and Hidden Markov Models. This approach captures contextual information in visual speech that may be lost using a Hidden Markov Model alone. We apply contextual modelling to a large speaker independent isolated digit recognition task, and compare our approach to two commonly adopted feature based techniques for incorporating speech dynamics. Results are presented from baseline feature based systems and the combined modelling technique. We illustrate that both of these techniques achieve similar levels of performance when used independently. However significant improvements in performance can be achieved through a combination of the two. In particular we report an improvement in excess of 17% relative Word Error Rate in comparison to our best baseline system.
Resumo:
For many applications of emotion recognition, such as virtual agents, the system must select responses while the user is speaking. This requires reliable on-line recognition of the user’s affect. However most emotion recognition systems are based on turnwise processing. We present a novel approach to on-line emotion recognition from speech using Long Short-Term Memory Recurrent Neural Networks. Emotion is recognised frame-wise in a two-dimensional valence-activation continuum. In contrast to current state-of-the-art approaches, recognition is performed on low-level signal frames, similar to those used for speech recognition. No statistical functionals are applied to low-level feature contours. Framing at a higher level is therefore unnecessary and regression outputs can be produced in real-time for every low-level input frame. We also investigate the benefits of including linguistic features on the signal frame level obtained by a keyword spotter.
Resumo:
In this paper, a novel video-based multimodal biometric verification scheme using the subspace-based low-level feature fusion of face and speech is developed for specific speaker recognition for perceptual human--computer interaction (HCI). In the proposed scheme, human face is tracked and face pose is estimated to weight the detected facelike regions in successive frames, where ill-posed faces and false-positive detections are assigned with lower credit to enhance the accuracy. In the audio modality, mel-frequency cepstral coefficients are extracted for voice-based biometric verification. In the fusion step, features from both modalities are projected into nonlinear Laplacian Eigenmap subspace for multimodal speaker recognition and combined at low level. The proposed approach is tested on the video database of ten human subjects, and the results show that the proposed scheme can attain better accuracy in comparison with the conventional multimodal fusion using latent semantic analysis as well as the single-modality verifications. The experiment on MATLAB shows the potential of the proposed scheme to attain the real-time performance for perceptual HCI applications.
Resumo:
In this paper we present a novel method for performing speaker recognition with very limited training data and in the presence of background noise. Similarity-based speaker recognition is considered so that speaker models can be created with limited training speech data. The proposed similarity is a form of cosine similarity used as a distance measure between speech feature vectors. Each speech frame is modelled using subband features, and into this framework, multicondition training and optimal feature selection are introduced, making the system capable of performing speaker recognition in the presence of realistic, time-varying noise, which is unknown during training. Speaker identi?cation experiments were carried out using the SPIDRE database. The performance of the proposed new system for noise compensation is compared to that of an oracle model; the speaker identi?cation accuracy for clean speech by the new system trained with limited training data is compared to that of a GMM trained with several minutes of speech. Both comparisons have demonstrated the effectiveness of the new model. Finally, experiments were carried out to test the new model for speaker identi?cation given limited training data and with differing levels and types of realistic background noise. The results have demonstrated the robustness of the new system.
Resumo:
It is a legitimate assertion that the common ground of work of worth in architecture, whether theoretical or built comes from a firmly held position on the part of the author. In addition to delivery key competencies architectural education should act to support the formation of such a position in the student, or to make students aware of the possibility of holding such a position.
It is with this in mind perhaps that intensive unit-based diploma and masters structures are increasingly becoming the standard structure for for schools of architecture across the UK. The strengths of such a structure are most evident when the school, either by virtue of financial strength or geographic location is able to attract a diverse range of contrasting positions to bear in the formation of these units. In effect the offering to the student is a short, intensive immersion into a clear line of thought based on the position of those running the unit. Research is channeled by those running the unit to the work of the students. A single cohort of students therefore is able to observe and understand a wide range of ways of thinking about the subject whether or not they are participants in a unit or not. It is axiomatic that where this structure is applied in the absence of these resources the result can be less helpful, individual units are differentiated not to reflect the interests of those running the unit but for the sake of difference as its own end.
In structuring the M.Arch programme in Queens University Belfast the reality of our somewhat peripheral location was placed at the forefront of our considerations. A single 4 semester studio is offered. The first three semesters are carefully structured to offer a range of directed and self directed projects to the students. By interrogation of these projects, and work undertaken at undergraduate level the aim is to assist the students to identify a personal position on architecture, which is then developed in the thesis in semester four. Research and design outputs are emergent from the interest of the student body, cultivated by staff who have the time over the four semesters to get to know all aspects of a students interests.
This paper will lay out this structure and some of the projects run within it. Now having delivered two graduating years the successes and challenges of the system will be laid out by reference to several case studies of individual student experiences of the structure.
Resumo:
There is considerable interest in creating embedded, speech recognition hardware using the weighted finite state transducer (WFST) technique but there are performance and memory usage challenges. Two system optimization techniques are presented to address this; one approach improves token propagation by removing the WFST epsilon input arcs; another one-pass, adaptive pruning algorithm gives a dramatic reduction in active nodes to be computed. Results for memory and bandwidth are given for a 5,000 word vocabulary giving a better practical performance than conventional WFST; this is then exploited in an adaptive pruning algorithm that reduces the active nodes from 30,000 down to 4,000 with only a 2 percent sacrifice in speech recognition accuracy; these optimizations lead to a more simplified design with deterministic performance.
Resumo:
This paper presents a novel method that leverages reasoning capabilities in a computer vision system dedicated to human action recognition. The proposed methodology is decomposed into two stages. First, a machine learning based algorithm - known as bag of words - gives a first estimate of action classification from video sequences, by performing an image feature analysis. Those results are afterward passed to a common-sense reasoning system, which analyses, selects and corrects the initial estimation yielded by the machine learning algorithm. This second stage resorts to the knowledge implicit in the rationality that motivates human behaviour. Experiments are performed in realistic conditions, where poor recognition rates by the machine learning techniques are significantly improved by the second stage in which common-sense knowledge and reasoning capabilities have been leveraged. This demonstrates the value of integrating common-sense capabilities into a computer vision pipeline. © 2012 Elsevier B.V. All rights reserved.
Resumo:
Smart Spaces, Ambient Intelligence, and Ambient Assisted Living are environmental paradigms that strongly depend on their capability to recognize human actions. While most solutions rest on sensor value interpretations and video analysis applications, few have realized the importance of incorporating common-sense capabilities to support the recognition process. Unfortunately, human action recognition cannot be successfully accomplished by only analyzing body postures. On the contrary, this task should be supported by profound knowledge of human agency nature and its tight connection to the reasons and motivations that explain it. The combination of this knowledge and the knowledge about how the world works is essential for recognizing and understanding human actions without committing common-senseless mistakes. This work demonstrates the impact that episodic reasoning has in improving the accuracy of a computer vision system for human action recognition. This work also presents formalization, implementation, and evaluation details of the knowledge model that supports the episodic reasoning.
Resumo:
This chapter describes an experimental system for the recognition of human faces from surveillance video. In surveillance applications, the system must be robust to changes in illumination, scale, pose and expression. The system must also be able to perform detection and recognition rapidly in real time. Our system detects faces using the Viola-Jones face detector, then extracts local features to build a shape-based feature vector. The feature vector is constructed from ratios of lengths and differences in tangents of angles, so as to be robust to changes in scale and rotations in-plane and out-of-plane. Consideration was given to improving the performance and accuracy of both the detection and recognition steps.
Resumo:
Optically active S-alkyl-N, N'-bis((S)-1-phenylethyl) thiouronium salts, abbreviated as (S)-[Cnpetu] Y (where Y is an anion; n = 1, 2, 3, 4, 6, 8, 10, 12 or 16), have been prepared and studied by a broad spectrum of analyses. This consists of density, viscosity, and conductivity determination, followed by a discussion of relevant correlations. Unusual trends depending on the S-alkyl chain length were documented for (S)-[Cnpetu][ NTf2] series (where [NTf2] = bis{(trifluoromethyl) sulfonyl} amide), including the viscosity decreasing with increasing chain length, and the conductivity showing a maximum between the S-butyl and the S-hexyl derivative. In addition, a hindered rotamerism of the thiouronium cation in dmso-d(6) solution was recognised by H-1 and C-13 NMR techniques. Thorough analysis of NMR spectra confirmed that the main contribution comes from rotation about the partial double C-S bond. For the first time, a neat thiouronium ionic liquid system has been subjected to quantitative analysis of hindered rotamerism by dynamic NMR coalescence studies, with estimated activation energy for rotation of 63.9 +/- 0.4 kJ mol(-1). Finally, the application of (S)-[C(n)petu] Y salts as chiral discriminating agents for carboxylates by 1H NMR spectroscopy was further investigated, demonstrating the influence of the S-alkyl chain length on chiral recognition; (S)-[C(2)petu][NTf2] ionic liquid with the mandelate anion gave the best results.
Resumo:
Burkholderia cenocepacia is an opportunistic pathogen threatening patients with cystic fibrosis. Flagella are required for biofilm formation, as well as adhesion to and invasion of epithelial cells. Recognition of flagellin via the Toll-like receptor 5 (TLR5) contributes to exacerbate B. cenocepacia-induced lung epithelial inflammatory responses. In this study, we report that B. cenocepacia flagellin is glycosylated on at least 10 different sites with a single sugar, 4,6-dideoxy-4-(3-hydroxybutanoylamino)-d-glucose. We have identified key genes that are required for flagellin glycosylation, including a predicted glycosyltransferase gene that is linked to the flagellin biosynthesis cluster and a putative acetyltransferase gene located within the O-antigen lipopolysaccharide cluster. Another O-antigen cluster gene, rmlB, which is required for flagellin glycan and O-antigen biosynthesis, was essential for bacterial viability, uncovering a novel target against Burkholderia infections. Using glycosylated and nonglycosylated purified flagellin and a cell reporter system to assess TLR5-mediated responses, we also show that the presence of glycan in flagellin significantly impairs the inflammatory response of epithelial cells. We therefore suggest that flagellin glycosylation reduces recognition of flagellin by host TLR5, providing an evasive strategy to infecting bacteria.
Resumo:
This paper argues that biometric verification evaluations can obscure vulnerabilities that increase the chances that an attacker could be falsely accepted. This can occur because existing evaluations implicitly assume that an imposter claiming a false identity would claim a random identity rather than consciously selecting a target to impersonate. This paper shows how an attacker can select a target with a similar biometric signature in order to increase their chances of false acceptance. It demonstrates this effect using a publicly available iris recognition algorithm. The evaluation shows that the system can be vulnerable to attackers targeting subjects who are enrolled with a smaller section of iris due to occlusion. The evaluation shows how the traditional DET curve analysis conceals this vulnerability. As a result, traditional analysis underestimates the importance of an existing score normalisation method for addressing occlusion. The paper concludes by evaluating how the targeted false acceptance rate increases with the number of available targets. Consistent with a previous investigation of targeted face verification performance, the experiment shows that the false acceptance rate can be modelled using the traditional FAR measure with an additional term that is proportional to the logarithm of the number of available targets.
Resumo:
This paper provides an integrated overview of the factors which control gelation in a family of dendritic gelators based on lysine building blocks. In particular, we establish that higher generation systems are more effective gelators, amide linkages in the dendron are better than carbamates, and long alkyl chain surface groups and a carboxylic acid at the focal point enhance gelation. The gels are best formed in relatively low polarity solvents with no hydrogen bond donor ability and limited hydrogen bond acceptor capacity. The dendrons with acid groups at the focal point can form two component gels with diaminododecane, and in this case, it is the lower generation dendrons which can avoid steric hindrance and form more effective gels. The stereochemistry of lysine is crucial in self-assembly, with opposite enantiomers disrupting each other's molecular recognition pathways. For the two-component system, stoichiometry is key, if too much diamine is present, dendron-stabilised microcrystals of the diamine begin to form. Interestingly, gelation still occurs in this case, and the systems with amides/alkyl chains are more effective gels, as a consequence of enhanced dendron-dendron intermolecular interactions allowing the microcrystals to form an interconnected network.