21 resultados para Letters in word recognition
Resumo:
This paper presents a method for vote-based 3D shape recognition and registration, in particular using mean shift on 3D pose votes in the space of direct similarity transforms for the first time. We introduce a new distance between poses in this spacethe SRT distance. It is left-invariant, unlike Euclidean distance, and has a unique, closed-form mean, in contrast to Riemannian distance, so is fast to compute. We demonstrate improved performance over the state of the art in both recognition and registration on a real and challenging dataset, by comparing our distance with others in a mean shift framework, as well as with the commonly used Hough voting approach. © 2011 IEEE.
Resumo:
This chapter presents a method for vote-based 3D shape recognition and registration, in particular using mean shift on 3D pose votes in the space of direct similarity transformations for the first time. We introduce a new distance between poses in this spacethe SRT distance. It is left-invariant, unlike Euclidean distance, and has a unique, closed-form mean, in contrast to Riemannian distance, so is fast to compute. We demonstrate improved performance over the state of the art in both recognition and registration on a (real and) challenging dataset, by comparing our distance with others in a mean shift framework, as well as with the commonly used Hough voting approach. © 2013 Springer-Verlag Berlin Heidelberg.
Resumo:
A visual target is more difficult to recognize when it is surrounded by other, similar objects. This breakdown in object recognition is known as crowding. Despite a long history of experimental work, computational models of crowding are still sparse. Specifically, few studies have examined crowding using an ideal-observer approach. Here, we compare crowding in ideal observers with crowding in humans. We derived an ideal-observer model for target identification under conditions of position and identity uncertainty. Simulations showed that this model reproduces the hallmark of crowding, namely a critical spacing that scales with viewing eccentricity. To examine how well the model fits quantitatively to human data, we performed three experiments. In Experiments 1 and 2, we measured observers' perceptual uncertainty about stimulus positions and identities, respectively, for a target in isolation. In Experiment 3, observers identified a target that was flanked by two distractors. We found that about half of the errors in Experiment 3 could be accounted for by the perceptual uncertainty measured in Experiments 1 and 2. The remainder of the errors could be accounted for by assuming that uncertainty (i.e., the width of internal noise distribution) about stimulus positions and identities depends on flanker proximity. Our results provide a mathematical restatement of the crowding problem and support the hypothesis that crowding behavior is a sign of optimality rather than a perceptual defect.
Resumo:
This paper introduces a novel method for the training of a complementary acoustic model with respect to set of given acoustic models. The method is based upon an extension of the Minimum Phone Error (MPE) criterion and aims at producing a model that makes complementary phone errors to those already trained. The technique is therefore called Complementary Phone Error (CPE) training. The method is evaluated using an Arabic large vocabulary continuous speech recognition task. Reductions in word error rate (WER) after combination with a CPE-trained system were obtained with up to 0.7% absolute for a system trained on 172 hours of acoustic data and up to 0.2% absolute for the final system trained on nearly 2000 hours of Arabic data.
Resumo:
Spoken dialogue systems provide a convenient way for users to interact with a machine using only speech. However, they often rely on a rigid turn taking regime in which a voice activity detection (VAD) module is used to determine when the user is speaking and decide when is an appropriate time for the system to respond. This paper investigates replacing the VAD and discrete utterance recogniser of a conventional turn-taking system with a continuously operating recogniser that is always listening, and using the recogniser 1-best path to guide turn taking. In this way, a flexible framework for incremental dialogue management is possible. Experimental results show that it is possible to remove the VAD component and successfully use the recogniser best path to identify user speech, with more robustness to noise, potentially smaller latency times, and a reduction in overall recognition error rate compared to using the conventional approach. © 2013 IEEE.