229 resultados para Visual Speaker Recognition, Visual Speech Recognition, Cascading Appearance-Based Features
Resumo:
Air pockets, one kind of concrete surface defects, are often created on formed concrete surfaces during concrete construction. Their existence undermines the desired appearance and visual uniformity of architectural concrete. Therefore, measuring the impact of air pockets on the concrete surface in the form of air pockets is vital in assessing the quality of architectural concrete. Traditionally, such measurements are mainly based on in-situ manual inspections, the results of which are subjective and heavily dependent on the inspectors’ own criteria and experience. Often, inspectors may make different assessments even when inspecting the same concrete surface. In addition, the need for experienced inspectors costs owners or general contractors more in inspection fees. To alleviate these problems, this paper presents a methodology that can measure air pockets quantitatively and automatically. In order to achieve this goal, a high contrast, scaled image of a concrete surface is acquired from a fixed distance range and then a spot filter is used to accurately detect air pockets with the help of an image pyramid. The properties of air pockets (the number, the size, and the occupation area of air pockets) are subsequently calculated. These properties are used to quantify the impact of air pockets on the architectural concrete surface. The methodology is implemented in a C++ based prototype and tested on a database of concrete surface images. Comparisons with manual tests validated its measuring accuracy. As a result, the methodology presented in this paper can increase the reliability of concrete surface quality assessment
Resumo:
In spite of over two decades of intense research, illumination and pose invariance remain prohibitively challenging aspects of face recognition for most practical applications. The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of low resolution. The central contribution is an illumination invariant, which we show to be suitable for recognition from video of loosely constrained head motion. In particular there are three contributions: (i) we show how a photometric model of image formation can be combined with a statistical model of generic face appearance variation to exploit the proposed invariant and generalize in the presence of extreme illumination changes; (ii) we introduce a video sequence re-illumination algorithm to achieve fine alignment of two video sequences; and (iii) we use the smoothness of geodesically local appearance manifold structure and a robust same-identity likelihood to achieve robustness to unseen head poses. We describe a fully automatic recognition system based on the proposed method and an extensive evaluation on 323 individuals and 1474 video sequences with extreme illumination, pose and head motion variation. Our system consistently achieved a nearly perfect recognition rate (over 99.7% on all four databases). © 2012 Elsevier Ltd All rights reserved.
Resumo:
It is commonly believed that visual short-term memory (VSTM) consists of a fixed number of "slots" in which items can be stored. An alternative theory in which memory resource is a continuous quantity distributed over all items seems to be refuted by the appearance of guessing in human responses. Here, we introduce a model in which resource is not only continuous but also variable across items and trials, causing random fluctuations in encoding precision. We tested this model against previous models using two VSTM paradigms and two feature dimensions. Our model accurately accounts for all aspects of the data, including apparent guessing, and outperforms slot models in formal model comparison. At the neural level, variability in precision might correspond to variability in neural population gain and doubly stochastic stimulus representation. Our results suggest that VSTM resource is continuous and variable rather than discrete and fixed and might explain why subjective experience of VSTM is not all or none.
Resumo:
Human listeners can identify vowels regardless of speaker size, although the sound waves for an adult and a child speaking the ’same’ vowel would differ enormously. The differences are mainly due to the differences in vocal tract length (VTL) and glottal pulse rate (GPR) which are both related to body size. Automatic speech recognition machines are notoriously bad at understanding children if they have been trained on the speech of an adult. In this paper, we propose that the auditory system adapts its analysis of speech sounds, dynamically and automatically to the GPR and VTL of the speaker on a syllable-to-syllable basis. We illustrate how this rapid adaptation might be performed with the aid of a computational version of the auditory image model, and we propose that an auditory preprocessor of this form would improve the robustness of speech recognisers.