907 resultados para Visual Speaker Recognition, Visual Speech Recognition, Cascading Appearance-Based Features


Relevância:

50.00% 50.00%

Publicador:

Resumo:

In spite of over two decades of intense research, illumination and pose invariance remain prohibitively challenging aspects of face recognition for most practical applications. The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of low resolution. In particular there are three areas of novelty: (i) we show how a photometric model of image formation can be combined with a statistical model of generic face appearance variation, learnt offline, to generalize in the presence of extreme illumination changes; (ii) we use the smoothness of geodesically local appearance manifold structure and a robust same-identity likelihood to achieve invariance to unseen head poses; and (iii) we introduce an accurate video sequence "reillumination" algorithm to achieve robustness to face motion patterns in video. We describe a fully automatic recognition system based on the proposed method and an extensive evaluation on 171 individuals and over 1300 video sequences with extreme illumination, pose and head motion variation. On this challenging data set our system consistently demonstrated a nearly perfect recognition rate (over 99.7%), significantly outperforming state-of-the-art commercial software and methods from the literature. © Springer-Verlag Berlin Heidelberg 2006.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Air pockets, one kind of concrete surface defects, are often created on formed concrete surfaces during concrete construction. Their existence undermines the desired appearance and visual uniformity of architectural concrete. Therefore, measuring the impact of air pockets on the concrete surface in the form of air pockets is vital in assessing the quality of architectural concrete. Traditionally, such measurements are mainly based on in-situ manual inspections, the results of which are subjective and heavily dependent on the inspectors’ own criteria and experience. Often, inspectors may make different assessments even when inspecting the same concrete surface. In addition, the need for experienced inspectors costs owners or general contractors more in inspection fees. To alleviate these problems, this paper presents a methodology that can measure air pockets quantitatively and automatically. In order to achieve this goal, a high contrast, scaled image of a concrete surface is acquired from a fixed distance range and then a spot filter is used to accurately detect air pockets with the help of an image pyramid. The properties of air pockets (the number, the size, and the occupation area of air pockets) are subsequently calculated. These properties are used to quantify the impact of air pockets on the architectural concrete surface. The methodology is implemented in a C++ based prototype and tested on a database of concrete surface images. Comparisons with manual tests validated its measuring accuracy. As a result, the methodology presented in this paper can increase the reliability of concrete surface quality assessment

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In spite of over two decades of intense research, illumination and pose invariance remain prohibitively challenging aspects of face recognition for most practical applications. The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of low resolution. The central contribution is an illumination invariant, which we show to be suitable for recognition from video of loosely constrained head motion. In particular there are three contributions: (i) we show how a photometric model of image formation can be combined with a statistical model of generic face appearance variation to exploit the proposed invariant and generalize in the presence of extreme illumination changes; (ii) we introduce a video sequence re-illumination algorithm to achieve fine alignment of two video sequences; and (iii) we use the smoothness of geodesically local appearance manifold structure and a robust same-identity likelihood to achieve robustness to unseen head poses. We describe a fully automatic recognition system based on the proposed method and an extensive evaluation on 323 individuals and 1474 video sequences with extreme illumination, pose and head motion variation. Our system consistently achieved a nearly perfect recognition rate (over 99.7% on all four databases). © 2012 Elsevier Ltd All rights reserved.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

It is commonly believed that visual short-term memory (VSTM) consists of a fixed number of "slots" in which items can be stored. An alternative theory in which memory resource is a continuous quantity distributed over all items seems to be refuted by the appearance of guessing in human responses. Here, we introduce a model in which resource is not only continuous but also variable across items and trials, causing random fluctuations in encoding precision. We tested this model against previous models using two VSTM paradigms and two feature dimensions. Our model accurately accounts for all aspects of the data, including apparent guessing, and outperforms slot models in formal model comparison. At the neural level, variability in precision might correspond to variability in neural population gain and doubly stochastic stimulus representation. Our results suggest that VSTM resource is continuous and variable rather than discrete and fixed and might explain why subjective experience of VSTM is not all or none.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Human listeners can identify vowels regardless of speaker size, although the sound waves for an adult and a child speaking the ’same’ vowel would differ enormously. The differences are mainly due to the differences in vocal tract length (VTL) and glottal pulse rate (GPR) which are both related to body size. Automatic speech recognition machines are notoriously bad at understanding children if they have been trained on the speech of an adult. In this paper, we propose that the auditory system adapts its analysis of speech sounds, dynamically and automatically to the GPR and VTL of the speaker on a syllable-to-syllable basis. We illustrate how this rapid adaptation might be performed with the aid of a computational version of the auditory image model, and we propose that an auditory preprocessor of this form would improve the robustness of speech recognisers.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The mandarin keyword spotting system was investigated, and a new approach was proposed based on the principle of homology continuity and point location analysis in high-dimensional space geometry theory which are both parts of biomimetic pattern recognition theory. This approach constructed a hyper-polyhedron with sample points in the training set and calculated the distance between each test point and the hyper-polyhedron. The classification resulted from the value of those distances. The approach was tested by a speech database which was created by ourselves. The performance was compared with the classic HMM approach and the results show that the new approach is much better than HMM approach when the training data is not sufficient.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In this paper, we presents HyperSausage Neuron based on the High-Dimension Space(HDS), and proposes a new algorithm for speaker independent continuous digit speech recognition. At last, compared to HMM-based method, the recognition rate of HyperSausage Neuron method is higher than that of in HMM-based method.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In this paper, we presents HyperSausage Neuron based on the High-Dimension Space(HDS), and proposes a new algorithm for speaker independent continuous digit speech recognition. At last, compared to HMM-based method, the recognition rate of HyperSausage Neuron method is higher than that of in HMM-based method.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The distinguishment between the object appearance and the background is the useful cues available for visual tracking in which the discriminant analysis is widely applied However due to the diversity of the background observation there are not adequate negative samples from the background which usually lead the discriminant method to tracking failure Thus a natural solution is to construct an object-background pair constrained by the spatial structure which could not only reduce the neg-sample number but also make full use of the background information surrounding the object However this Idea is threatened by the variant of both the object appearance and the spatial-constrained background observation especially when the background shifts as the moving of the object Thus an Incremental pairwise discriminant subspace is constructed in this paper to delineate the variant of the distinguishment In order to maintain the correct the ability of correctly describing the subspace we enforce two novel constraints for the optimal adaptation (1) pairwise data discriminant constraint and (2) subspace smoothness The experimental results demonstrate that the proposed approach can alleviate adaptation drift and achieve better visual tracking results for a large variety of nonstationary scenes (C) 2010 Elsevier B V All rights reserved

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Conventionally, biometrics resources, such as face, gait silhouette, footprint, and pressure, have been utilized in gender recognition systems. However, the acquisition and processing time of these biometrics data makes the analysis difficult. This letter demonstrates for the first time how effective the footwear appearance is for gender recognition as a biometrics resource. A footwear database is also established with reprehensive shoes (footwears). Preliminary experimental results suggest that footwear appearance is a promising resource for gender recognition. Moreover, it also has the potential to be used jointly with other developed biometrics resources to boost performance.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Eye detection plays an important role in many practical applications. This paper presents a novel two-step scheme for eye detection. The first step models an eye by a newly defined visual-context pattern (VCP), and the second step applies semisupervised boosting for precise detection. VCP describes both the space and appearance relations between an eye region (region of eye) and a reference region (region of reference). The context feature of a VCP is extracted by using the integral image. Aiming to reduce the human labeling efforts, we apply semisupervised boosting, which integrates the context feature and the Haar-like features for precise eye detection. Experimental results on several standard face data sets demonstrate that the proposed approach is effective, robust, and efficient. We finally show that this approach is ready for practical applications.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

A number of functional neuroimaging studies with skilled readers consistently showed activation to visual words in the left mid-fusiform cortex in occipitotemporal sulcus (LMFC-OTS). Neuropsychological studies also showed that lesions at left ventral occipitotemporal areas result in impairment in visual word processing. Based on these empirical observations and some theoretical speculations, a few researchers postulated that the LMFC-OTS is responsible for instant parallel and holistic extraction of the abstract representation of letter strings, and labeled this piece of cortex as “visual word form area” (VWFA). Nonetheless, functional neuroimaging studies alone is basically a correlative rather than causal approach, and lesions in the previous studies were typically not constrained within LMFC-OTS but also involving other brain regions beyond this area. Given these limitations, it remains unanswered for three fundamental questions: is LMFC-OTS necessary for visual word processing? is this functionally selective for visual word processing while unnecessary for processing of non-visual word stimuli? what are its function properties in visual word processing? This thesis aimed to address these questions through a series of neuropsychological, anatomical and functional MRI experiments in four patients with different degrees of impairments in the left fusiform gyrus. Necessity: Detailed analysis of anatomical brain images revealed that the four patients had differential foci of brain infarction. Specifically, the LMFC-OTS was damaged in one patient, while it remained intact in the other three. Neuropsychological experiments showed that the patient with lesions in the LMFC-OTS had severe impairments in reading aloud and recognizing Chinese characters, i.e., pure alexia. The patient with intact LMFC-OTS but information from the left visual field (LVF) was blocked due to lesions in the splenium of corpus callosum, showed impairment in Chinese characters recognition when the stimuli were presented in the LVF but not in the RVF, i.e. left hemialexia. In contrast, the other two patients with intact LMFC-OTS had normal function in processing Chinese characters. The fMRI experiments demonstrated that there was no significant activation to Chinese characters in the LMFC-OTS of the pure alexic patient and of the patient with left hemialexia when the stimuli were presented in the LVF. On the other hand, this patient, when Chinese characters were presented in right visual field, and the other two with intact LMFC-OTS had activation in the LMFC-OTS. These results together point to the necessity of the LMFC-OTS for Chinese character processing. Selectivity: We tested selectivity of the LMFC-OTS for visual word processing through systematically examining the patients’ ability for processing visual vs. auditory words, and word vs. non-word visual stimuli, such as faces, objects and colors. Results showed that the pure alexic patients could normally process auditory words (expression, understanding and repetition of orally presented words) and non-word visual stimuli (faces, objects, colors and numbers). Although the patient showed some impairments in naming faces, objects and colors, his performance scores were only slightly lower or not significantly different relative to those of the patients with intact LMFC-OTS. These data provide compelling evidence that the LMFC-OTS is not requisite for processing non-visual word stimuli, thus has selectivity for visual word processing. Functional properties: With tasks involving multiple levels and aspects of word processing, including Chinese character reading, phonological judgment, semantic judgment, identity judgment of abstract visual word representation, lexical decision, perceptual judgment of visual word appearance, and dictation, copying, voluntary writing, etc., we attempted to reveal the most critical dysfunction caused by damage in the LMFC-OTS, thus to clarify the most essential function of this region. Results showed that in addition to dysfunctions in Chinese character reading, phonological and semantic judgment, the patient with lesions at LMFC-OTS failed to judge correctly whether two characters (including compound and simple characters) with different surface features (e.g., different fonts, printed vs. handwritten vs. calligraphy styles, simplified characters vs. traditional characters, different orientations of strokes or whole characters) had the same abstract representation. The patient initially showed severe impairments in processing both simple characters and compound characters. He could only copy a compound character in a stroke-by-stroke manner, but not by character-by-character or even by radical-by-radical manners. During the recovery process, namely five months later, the patient could complete the abstract representation tasks of simple characters, but showed no improvement for compound characters. However, he then could copy compound characters in a radical-by-radical manner. Furthermore, it seems that the recovery of copying paralleled to that of judgment of abstract representation. These observations indicate that lesions of the LMFC-OTS in the pure alexic patients caused several damage in the ability of extracting the abstract representation from lower level units to higher level units, and the patient had especial difficulty to extract the abstract representation of whole character from its secondary units (e.g., radicals or single characters) and this ability was resistant to recover from impairment. Therefore, the LMFC-OTS appears to be responsible for the multilevel (particularly higher levels) abstract representations of visual word form. Successful extraction seems independent on access to phonological and semantic information, given the alexic patient showed severe impairments in reading aloud and semantic processing on simple characters while maintenance of intact judgment on their abstract representation. However, it is also possible that the interaction between the abstract representation and its related information e.g. phonological and semantic information was damaged as well in this patient. Taken together, we conclude that: 1) the LMFC-OTS is necessary for Chinese character processing, 2) it is selective for Chinese character processing, and 3) its critical function is to extract multiple levels of abstract representation of visual word and possibly to transmit it to phonological and semantic systems.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Crowding, generally defined as the deleterious influence of nearby contours on visual discrimination, is ubiquitous in spatial vision. Specifically, long-range effects of non-overlapping distracters can alter the appearance of an object, making it unrecognizable. Theories in many domains, including vision computation and high-level attention, have been proposed to account for crowding. However, neither compulsory averaging model nor insufficient spatial esolution of attention provides an adequate explanation for crowding. The present study examined the effects of perceptual organization on crowding. We hypothesize that target-distractor segmentation in crowding is analogous to figure-ground segregation in Gestalt. When distractors can be grouped as a whole or when they are similar to each other but different from the target, the target can be distinguished from distractors. However, grouping target and distractors together by Gestalt principles may interfere with target-distractor separation. Six experiments were carried out to assess our theory. In experiments 1, 2, and 3, we manipulated the similarity between target and distractor as well as the configuration of distractors to investigate the effects of stimuli-driven grouping on target-distractor segmentation. In experiments 4, 5, and 6, we focused on the interaction between bottom-up and top-down processes of grouping, and their influences on target-distractor segmentation. Our results demonstrated that: (a) when distractors were similar to each other but different from target, crowding was eased; (b) when distractors formed a subjective contour or were placed regularly, crowding was also reduced; (c) both bottom-up and top-down processes could influence target-distractor grouping, mediating the effects of crowding. These results support our hypothesis that the figure-ground segregation and target-distractor segmentation in crowding may share similar processes. The present study not only provides a novel explanation for crowding, but also examines the processing bottleneck in object recognition. These findings have significant implications on computer vision and interface design as well as on clinical practice in amblyopia and dyslexia.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This paper describes a representation of the dynamics of human walking action for the purpose of person identification and classification by gait appearance. Our gait representation is based on simple features such as moments extracted from video silhouettes of human walking motion. We claim that our gait dynamics representation is rich enough for the task of recognition and classification. The use of our feature representation is demonstrated in the task of person recognition from video sequences of orthogonal views of people walking. We demonstrate the accuracy of recognition on gait video sequences collected over different days and times, and under varying lighting environments. In addition, preliminary results are shown on gender classification using our gait dynamics features.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

An investigation is made into the problem of constructing a model of the appearance to an optical input device of scenes consisting of plane-faced geometric solids. The goal is to study algorithms which find the real straight edges in the scenes, taking into account smooth variations in intensity over faces of the solids, blurring of edges and noise. A general mathematical analysis is made of optimal methods for identifying the edge lines in figures, given a raster of intensities covering the entire field of view. There is given in addition a suboptimal statistical decision procedure, based on the model, for the identification of a line within a narrow band on the field of view given an array of intensities from within the band. A computer program has been written and extensively tested which implements this procedure and extracts lines from real scenes. Other programs were written which judge the completeness of extracted sets of lines, and propose and test for additional lines which had escaped initial detection. The performance of these programs is discussed in relation to the theory derived from the model, and with regard to their use of global information in detecting and proposing lines.