954 resultados para Digit speech recognition


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This PhD research has provided novel solutions to three major challenges which have prevented the wide spread deployment of speaker recognition technology: (1) combating enrolment/ verification mismatch, (2) reducing the large amount of development and training data that is required and (3) reducing the duration of speech required to verify a speaker. A range of applications of speaker recognition technology from forensics in criminal investigations to secure access in banking will benefit from the research outcomes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis investigates face recognition in video under the presence of large pose variations. It proposes a solution that performs simultaneous detection of facial landmarks and head poses across large pose variations, employs discriminative modelling of feature distributions of faces with varying poses, and applies fusion of multiple classifiers to pose-mismatch recognition. Experiments on several benchmark datasets have demonstrated that improved performance is achieved using the proposed solution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper evaluates the performance of different text recognition techniques for a mobile robot in an indoor (university campus) environment. We compared four different methods: our own approach using existing text detection methods (Minimally Stable Extremal Regions detector and Stroke Width Transform) combined with a convolutional neural network, two modes of the open source program Tesseract, and the experimental mobile app Google Goggles. The results show that a convolutional neural network combined with the Stroke Width Transform gives the best performance in correctly matched text on images with single characters whereas Google Goggles gives the best performance on images with multiple words. The dataset used for this work is released as well.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Problem addressed Wrist-worn accelerometers are associated with greater compliance. However, validated algorithms for predicting activity type from wrist-worn accelerometer data are lacking. This study compared the activity recognition rates of an activity classifier trained on acceleration signal collected on the wrist and hip. Methodology 52 children and adolescents (mean age 13.7 +/- 3.1 year) completed 12 activity trials that were categorized into 7 activity classes: lying down, sitting, standing, walking, running, basketball, and dancing. During each trial, participants wore an ActiGraph GT3X+ tri-axial accelerometer on the right hip and the non-dominant wrist. Features were extracted from 10-s windows and inputted into a regularized logistic regression model using R (Glmnet + L1). Results Classification accuracy for the hip and wrist was 91.0% +/- 3.1% and 88.4% +/- 3.0%, respectively. The hip model exhibited excellent classification accuracy for sitting (91.3%), standing (95.8%), walking (95.8%), and running (96.8%); acceptable classification accuracy for lying down (88.3%) and basketball (81.9%); and modest accuracy for dance (64.1%). The wrist model exhibited excellent classification accuracy for sitting (93.0%), standing (91.7%), and walking (95.8%); acceptable classification accuracy for basketball (86.0%); and modest accuracy for running (78.8%), lying down (74.6%) and dance (69.4%). Potential Impact Both the hip and wrist algorithms achieved acceptable classification accuracy, allowing researchers to use either placement for activity recognition.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Vision-based place recognition involves recognising familiar places despite changes in environmental conditions or camera viewpoint (pose). Existing training-free methods exhibit excellent invariance to either of these challenges, but not both simultaneously. In this paper, we present a technique for condition-invariant place recognition across large lateral platform pose variance for vehicles or robots travelling along routes. Our approach combines sideways facing cameras with a new multi-scale image comparison technique that generates synthetic views for input into the condition-invariant Sequence Matching Across Route Traversals (SMART) algorithm. We evaluate the system’s performance on multi-lane roads in two different environments across day-night cycles. In the extreme case of day-night place recognition across the entire width of a four-lane-plus-median-strip highway, we demonstrate performance of up to 44% recall at 100% precision, where current state-of-the-art fails.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis demonstrates that robots can learn about how the world changes, and can use this information to recognise where they are, even when the appearance of the environment has changed a great deal. The ability to localise in highly dynamic environments using vision only is a key tool for achieving long-term, autonomous navigation in unstructured outdoor environments. The proposed learning algorithms are designed to be unsupervised, and can be generated by the robot online in response to its observations of the world, without requiring information from a human operator or other external source.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an online, unsupervised training algorithm enabling vision-based place recognition across a wide range of changing environmental conditions such as those caused by weather, seasons, and day-night cycles. The technique applies principal component analysis to distinguish between aspects of a location’s appearance that are condition-dependent and those that are condition-invariant. Removing the dimensions associated with environmental conditions produces condition-invariant images that can be used by appearance-based place recognition methods. This approach has a unique benefit – it requires training images from only one type of environmental condition, unlike existing data-driven methods that require training images with labelled frame correspondences from two or more environmental conditions. The method is applied to two benchmark variable condition datasets. Performance is equivalent or superior to the current state of the art despite the lesser training requirements, and is demonstrated to generalise to previously unseen locations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently Convolutional Neural Networks (CNNs) have been shown to achieve state-of-the-art performance on various classification tasks. In this paper, we present for the first time a place recognition technique based on CNN models, by combining the powerful features learnt by CNNs with a spatial and sequential filter. Applying the system to a 70 km benchmark place recognition dataset we achieve a 75% increase in recall at 100% precision, significantly outperforming all previous state of the art techniques. We also conduct a comprehensive performance comparison of the utility of features from all 21 layers for place recognition, both for the benchmark dataset and for a second dataset with more significant viewpoint changes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We describe a sequence of experiments investigating the strengths and limitations of Fukushima's neocognitron as a handwritten digit classifier. Using the results of these experiments as a foundation, we propose and evaluate improvements to Fukushima's original network in an effort to obtain higher recognition performance. The neocognitron's performance is shown to be strongly dependent on the choice of selectivity parameters and we present two methods to adjust these variables. Performance of the network under the more effective of the two new selectivity adjustment techniques suggests that the network fails to exploit the features that distinguish different classes of input data. To avoid this shortcoming, the network's final layer cells were replaced by a nonlinear classifier (a multilayer perceptron) to create a hybrid architecture. Tests of Fukushima's original system and the novel systems proposed in this paper suggest that it may be difficult for the neocognitron to achieve the performance of existing digit classifiers due to its reliance upon the supervisor's choice of selectivity parameters and training data. These findings pertain to Fukushima's implementation of the system and should not be seen as diminishing the practical significance of the concept of hierarchical feature extraction embodied in the neocognitron. © 1997 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Empirical evidence suggests impaired facial emotion recognition in schizophrenia. However, the nature of this deficit is the subject of ongoing research. The current study tested the hypothesis that a generalized deficit at an early stage of face-specific processing (i.e. putatively subserved by the fusiform gyrus) accounts for impaired facial emotion recognition in schizophrenia as opposed to the Negative Emotion-specific Deficit Model, which suggests impaired facial information processing at subsequent stages. Event-related potentials (ERPs) were recorded from 11 schizophrenia patients and 15 matched controls while performing a gender discrimination and a facial emotion recognition task. Significant reduction of the face-specific vertex positive potential (VPP) at a peak latency of 165 ms was confirmed in schizophrenia subjects whereas their early visual processing, as indexed by P1, was found to be intact. Attenuated VPP was found to correlate with subsequent P3 amplitude reduction and to predict accuracy when performing a facial emotion discrimination task. A subset of ten schizophrenia patients and ten matched healthy control subjects also performed similar tasks in the magnetic resonance imaging scanner. Patients showed reduced blood oxygenation level-dependent (BOLD) activation in the fusiform, inferior frontal, middle temporal and middle occipital gyrus as well as in the amygdala. Correlation analyses revealed that VPP and the subsequent P3a ERP components predict fusiform gyrus BOLD activation. These results suggest that problems in facial affect recognition in schizophrenia may represent flow-on effects of a generalized deficit in early visual processing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Neuroimaging research has shown localised brain activation to different facial expressions. This, along with the finding that schizophrenia patients perform poorly in their recognition of negative emotions, has raised the suggestion that patients display an emotion specific impairment. We propose that this asymmetry in performance reflects task difficulty gradations, rather than aberrant processing in neural pathways subserving recognition of specific emotions. A neural network model is presented, which classifies facial expressions on the basis of measurements derived from human faces. After training, the network showed an accuracy pattern closely resembling that of healthy subjects. Lesioning of the network led to an overall decrease in the network’s discriminant capacity, with the greatest accuracy decrease to fear, disgust and anger stimuli. This implies that the differential pattern of impairment in schizophrenia patients can be explained without having to postulate impairment of specific processing modules for negative emotion recognition.